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THE PREDICTIVE VALUE OF GENERAL INTELLI- 
GENCE TESTS IN THE SELECTION OF JUNIOR 
ACCOUNTANTS AND BOOKKEEPERS 


KENYON J. SCUDDER 
Whittier State School, California 


This paper is a report of a study of the training records of 
disabled veterans of the World War, who followed the voca- 
tional objectives of Junior Accountant or Bookkeeper. In 
order to qualify for vocational training, it was necessary to show 
the presence in the man of a disability which prevented his 
return to his pre-war occupation. These men were trained 
under the supervision of the Los Angeles Regional Office of 
the United States Veterans’ Bureau. One hundred seventy 
of the two hundred sixty-four cases studied completed their 
vocational training in these objectives and have been in employ- 
ment between three and four years. Each case was given the 
Terman Group Test of Mental Ability at some time in the 
course of his training. The data used herein was obtained by 
questionnaire during the spring of 1927. One hundred sixty- 
five questionnaires were sent, in the spring of 1927, to this 
group and one hundred three replied. Of those who replied 
32.0 per cent made raw scores below 125 on the Terman Group 
Test. Of the total group studied only 22.5 per cent made raw 
scores below 100. The group replying were composed of a few 
more of lower intelligence stratum than would be expected in 
a representative sampling of the total group of 264. 

The fields of junior accountancy and bookkeeping are similar 
in foundation work. Before allowing a trainee to follow the 
objective of junior accountant, we preferred that he show a 
Terman score of at least 125, together with at least a grade 
school education. The score of 125 was selected on the basis 
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of a study of twelve hundred trainees given general intelligence 
tests. We found those scoring less than 125 on the Terman 
Group Test experiencing great difficulty with college work and 
in most instances failing in their subjects. Setting the mini- 
mum as a score of 100 for both of these occupational choices 
gave the benefit of the doubt to the individual who might possess 
other qualities than superior intelligence which contribute 
to success; namely, such traits as ‘‘determination’” and 
‘“{ndustry.” 

Of the two hundred sixty-four cases selected, one hundred 
seventy completed their vocational training and were declared 
rehabilitated and returned to satisfactory employment. The 
other ninety-four cases were discontinued for various reasons. 
Complete data were available on one hundred seventy cases. 

The median score on the Terman Test for the entire two 
hundred sixty-four cases was 129; for the one hundred seventy 
rehabilitated cases the median score was 142; and for the 
ninety-four in the discontinued group the median score was 112, 
a difference of thirty points between those rehabilitated and 
those discontinued. Such a difference is significant when it is 
realized that a man was discontinued for failure on the job, and 
not on the test. In our use of the term, rehabilitation means 
more than the mere successful completion of a training pro- 
gram; it means, in addition, the overcoming of a vocational 
handicap and the return to useful, gainful employment. 

With the large number of cases handicapped by factors 
interfering with success (notably physical disability), it is 
really surprising that so many out of this group of one hundred 
seventy were successfully rehabilitated. Their high degree of 
mentality was probably a significant stabilizing factor definitely 
contributing toward successful attainment. Table 1 gives 
a synopsis and comparison of the salient points of the two 
groups. In addition to the facts already mentioned in regard 
to intelligence, “Wrong Attitude” is expressed in seventy-eight 
of the ninety-four cases, showing this to be a second contribut- 
ing cause of failure in 83 per cent of the cases. Among this 
discontinued group we find forty-three scoring above 125. 
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What was wrong with this latter group of superior mentality? 
In eighteen cases we find the men’s disability entering in as 
a discouraging factor, probably influencing a discontinuance. 
In contrast to this fact, it should be noted that of the fifty-one 
scoring below 125, only six were seriously disabled. The 
disability, then, was a definite contributing cause to the failure 
of this superior group. We also find “Wrong Attitude” ex- 
pressed in thirty-two out of the forty-three cases, a second 
contributing factor to failure. 


TABLE 1 
Comparison of successfully rehabilitated and ‘“‘discontinued’’ 
groups. (N = 264) 








DISCONTINUED 1 
NN EE ee eae ee ee Sr O4 170 
Median score on Terman Test............... 112 142 
Range of scores for group................... 18-194 37-216 
Number and per cent scoring above 100..... 57-61% 145-85% 
Number and per cent scoring below 100.... 37-39% 25-15% 
Average numbe:i of months in training..... 23 .68 39 .33 
Average school grade (pre-war).............. 8.84 9.76 
Average age on entering training........... 27 .07 28 .35 
Average per cent of disability rating........ 29.07 31.80 
Number and per cent of wrong attitude.... 78-83% 37-22% 











The ultimate success of any program of vocational rehabilita- 
tion can only be measured in terms of actual achievement, such 
as satisfactory employment, regular promotions, increased 
wage and a high degree of morale. One hundred twenty or 
70 per cent of the one hundred seventy cases were rehabilitated 
in the years 1923 and 1924 and have been in employment 
between three and four years. A study of their achievement 
in employment should afford a rather accurate check upon both 
the predictive value of intelligence tests and the results of the 
training program. Table 2 gives a synopsis of the result of 
this employment follow-up. 

Of these one hundred and three cases, eighty-five are still 
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employed in their training objective or in work closely related; 
eleven report a serious physical disability interfering with 
steady employment; while only seven failed and are now 
unemployed. 








TABLE 2 
Employment follow-up, Spring 1927. (N = 103) 
T. G. T. 
anour |, 8°8B.| “sruow 125" 
Terman Scores (above or below 

Bene Dinan ves ckokdeaen 103 70 33 (32.0%) 
1. Still employed as Junior Ac- 

ee eae 57 46 ll 
2. Still employed as Bookkeepers. .. 1 0 
3. Still employed in work relating 

to training objective.......... 27 21 6 
4. Still employed in training ob- 

DR. <3, tlds ceduwardcceass 55 41 14 
5. Permanently employed.......... 63 47 16 
6. Temporarily employed.......... 35 20 15 
7. Received promotions............ 54 42 12 
8. Have not received promotions. . 35 18 17 
9. Good prospects for promotion... 43 31 12 
10. Poor prospects for promotion... . 34 17 17 
11. Total pre-war monthly wage 

ss <a), widens denen pain $9,180 | $6,993 |$2,187 
12. Total monthly wage for group at 

TORMDAIEATIOR ..... cy ccccccces $9,282 | $6,642 |$2,640 
13. Total monthly wage for group, 

a eee $12,466 | $9,148 |$3,318 
14. Total monthly increase over 

Is Sain So 340k dae EO 0 60.00% $5,420 | $4,209 |$1,243 
15. Broken down physically......... 11 8 3 
16. Failed on the job............. 7 3 4 














One of the measures of successful rehabilitation lies in the 
revived earning capacity of the trainee. On entering training, 
as stated above, he was considered to be 100 per cent disabled 
for his particular vocation. We can consider rehabilitation 
as 100 per cent successful when the trainee is able to earn as 
much as he did before he became disabled; that is, before the 
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war. Should his present earning capacity exceed his pre-war 
status, remembering that he still carries a disability, then itmay 
be said that his success has gone beyond expectation. In the 
earning capacity of these one hundred seventy cases at the 
time of rehabilitation, we find some very interesting and stimu- 
lating facts. 


Total pre-war wage per month..................... $18 , 166.00 
Average pre-war wage per month.................. 106.85 
Total monthly wage at rehabilitation.............. 22,734.00 
Average monthly wage at rehabilitation........... 133.72 
Showing an average monthly increased earning ca- 

pacity over the pre-war wage of................. 16.87 


This in itself is a satisfactory accomplishment. The employ- 
ment follow-up of March 1, 1927, shows even more gratifying 
results. A study of the returns compared with pre-war, 
rehabilitation, and present wage shows some very startling 
facts. On the returns, sixteen were on a commission basis or 
in business for themselves and failed to state their present 
salary; twelve were temporarily unable to work because of 
physical reasons; four were out of work and one was a student 
in college. These cases were eliminated in the estimate of pres- 
sent earning capacity for the sake of consistency, although most 
of those on commission were earning $150.00 per month or 
better. Although we have reason to believe the pre-war earn- 
ings were often over-stated, yet the man’s own statement is 
given here. 

Table 3 shows sixty-nine of the one hundred three cases on 
which pre-war, rehabilitation and present wage were available. 
The following figures are of interest: 


Total pre-war monthly wage for the group........ $ 9,180.00 
Total monthly wage at rehabilitation.............. 9,282.00 
Total monthly wage as of March 1, 1927........... 12,466.00 
This shows an average monthly increase of 
Rehabilitation over pre-war salary of............ 102.00 
I UL, . no ceivccsesesictosonsese 1,244.00 
Present over rehabilitation.................0.00 3,184.00 
Ee MI OE sc cincccsceicndoscsescscece 32,208 .00 
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TABLE 3 


Increased earning capacity following rehabilitation—upper, middle and 
lowest ten cases 
































pre-war | BEwABIL-| PRESENT | TyCpe aon 
WAGE TATION MARCH OVER 
WAGE 1927 * by og 
$75 $150 $300 $225 
100 135 220 120 
140 130 200 60 
115 135 225 110 
DOD,  cictevntiviviamsncs 10 136 0 60 
susesed bedaeden 60 140 300 240 
100 200 300 200 
75 135 162 87 
85 145 200 115 
it 55 175 298 243 
( 60 100 130 70 
120 150 130 10 
75 150 200 125 
100 100 150 50 
? 60 145 115 55 
RRO Bots lk i RR 150 150 250 100 
110 125 150 40 
80 125 175 95 
100 100 110 10 
100 150 180 80 
150 200 140 0 
150 125 100 0 
150 110 175 25 
100 140 225 125 
IS os «a hata wien 08s = oo oo - 
42 110 110 68 
125 140 175 50 
90 140 155 65 
45 160 90 45 
Average months in training: 
Es cccpea ener sacar kenekes ckoakKieewichekese sen 38.2 , 6 
II, sis 4. ocn ieee bith ie wocigtlinn nati ike nia 40.8 3 2 
I ts sachet adalat sate ed sais 0's vedab'e en anndesnie | 49.0 
Average months increase: 
TG oasis Gl STANT 6.8. 30 4 b.0s.4 Oede eae tin aeanbhasuil $145.00 
i «sch capa uens she eub ing bs derdkekaad ekkmue ath ts ee 73.10 
CE Mn ti) ia todas ity gtebiiahctedbeds dae pad ees peseUR s 41.80 
6 
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With an increased earning capacity of over forty thousand 
dollars ($40,000.00) a year beyond what they were able to earn 
before the war, surely rehabilitation of this group has been 
worth while. The presence of a high morale is seen by the 
enthusiastic and commendatory statements voluntarily sub- 
mitted by these rehabilitated trainees. This high accomplish- 
ment in employment indicates not only careful selection and 
thorough preparation but ability to carry on by themselves, 
thus demonstrating a satisfactory adjustment to environment. 

Table 3 gives a comparison of the ten highest, the ten lowest, 
and the ten middle-most cases studied. Referring to the data, 
we find that in the upper ten cases, seven out of the ten are 
still employed in their training objectives, two are badly 
disabled and in the hospital, and the remaining one is a bond 
salesman at $200.00 a month. All but one are still suffering 
from a serious physical handicap ranging according to the 
Veterans’ Bureau ratings as from 20 to 60 per cent; only one 
showed a wrong attitude and yet he was successfully rehabili- 
tated despite this. Eight have been promoted and nine receive 
salaries which exceed their pre-war earnings. A consistent 
rehabilitation is shown by all ten and this is especially empha- 
sized by the trainees’ own voluntary remarks. 

In the middle ten cases, we find six out of ten are still em- 
ployed in their training objective; two are in the hospital or 
recuperating; one found the work too confining but is employed 
as a real estate salesman on commission, and the other, a 
colored man, is running an elevator, stating that opportunities 
in bookkeeping for his race are very limited. All but one 
are still suffering from a serious disability, ranging from 20 to 
75 per cent. None of the ten indicated a wrong attitude during 
training. Seven have enjoyed promotions, two are unable to 
work, and the other has been unable to secure employment as a 
bookkeeper on account of his race. Eight have exceeded their 
pre-war wage and nine show a consistent rehabilitation. This 
is again emphasized by the trainees’ remarks, expressing both 
satisfaction with his training and a friendly attitude toward 
the Bureau. 








8 KENYON J. SCUDDER 


The upper and middle group scored 140 or above on the 
Terman Test. In the lower ten cases we find a different story. 
This was to be expected, with a range of scores from 37 to 92 
for the group. An honest attempt was made, however, on 
the part of the Bureau to rehabilitate these men. From the 
standpoint of employment as junior accountants or bookkeepers 
the effect was wasted. Eight out of ten were rehabilitated in 
other occupations. While seven are still employed in the 
occupations followed at the time they completed training, they 
cannot be considered as consistent rehabilitations from the 
standpoint of this study. We were endeavoring to make 
junior accountants and bookkeepers out of this group and in 
eight out of the ten cases we failed. 

Of this lowest ten, four were badly handicapped with poor 
health. One indicated a “wrong attitude.’’ All but two were 
still suffering from a serious physical handicap, ranging from 20 
to 60 per cent. Four have enjoyed promotions and six have 
exceeded their pre-war wage. A glance at the present employ- 
ment of the group is very enlightening. One completed 
training as a junior accountant and is carrying on as an auditor. 
Two finished as bookkeepers. One succeeded—the other 
failed. We find the following distribution: 1 Junior Account- 
ant (Successful); 2 Bookkeepers (One successful, one failed) ; 
2.Salesmen; 2 Collectors; 1 Pipe fitter; 1 Fireman; 1 Tool 
room helper. 

These lower ten cases emphasize the fact that success in 
junior accountancy and bookkeeping depends to a large degree 
upon the mental capacity of the individual and that a minimum 
Terman Group Test score of 100 for bookkeepers and 125 for 
junior accountants is, in most cases, essential to such success. 
It should be added that one qualifying factor in the success 
of those scoring below 100 on the Terman Group Test and who 
succeeded in their vocational objectives was the fact that 
they possessed arithmetical ability, as measured by the Woody- 
McCall test, to compensate somewhat for their low intelligence 
rating. 
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THE MENTALITY OF THE CHINESE AND JAPANESE! 


HSIAO HUNG HSIAO 
Berkeley, California 


This review is intended to cover all the psychological studies 
that have been made of the Chinese and Japanese. Here by 
psychological studies are to be meant studies by means of tests. 
Therefore mere expressions of opinion based upon sporadic 
observations are excluded. Among these we may reckon such a 
book as R. Wilhelm’s “Die Seele Chinas” (1925) and such 
articles as ‘‘Chinese Psychically and Sentimentally Viewed” 
(Asiatic Review, S. 19: 225-31, April, 1923), “Emotional 
Nature of the Chinese,’ by P. 8. Buck (Nation, 123: 269-70, 
S. 22, 1926) and ‘Mental Characteristics of the Japanese” 
(Scribner’s Magazine, xvii, 79-92). Nor are we to include any 
morphological or anatomical studies, among which may be 
mentioned ‘‘Beitriige zur Rassen-anatomic der Chinesen,’”’ by 
F. Birkner (Arch. f. Anthropol., N.F., 1905, iv, 1-40), “Ueber 
die Zungen-papillen und die Zungen-grésse der Japaner,”’ by 
K. Kunitomo (Zsch. f. Morph. u. Anthrop, 1911, 14, 339-366), 
“The Facial Musculature of the Japanese,” by T. Kudo 
(Jour. of Morph., 1919, 32, 637-680), ‘Racial Differences in 
Palm and Sole Configurations of Japanese and Chinese,” by 
H. H. Wilder (diag. Amer. Jour. Phys. Anthropol., 5: 143-206, 
April, 1922) and “Das Chinesen Gehirn, ein Beitrag zur Mor- 
phologie und Stammes-geschichite der Gelben Rasse,” by 
E. Kurz (Zsch. f. Anat. u. Entwickgesch, 1924, 72, 199-387). 
Those who are interested in these phases of racial studies may 
go to the above-mentioned articles and books for the desired 
information. 


1 This is primarily written as a report for Professor G. M. Ruch’s 
Seminar, University of California. 
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STUDIES OF THE CHINESE 


In the years 1915 to 1917, J. W. Creighton (1), under the 
direction of Pyle, made a study of the mental and physical 
characteristics of Chinese children. The subjects studied 
numbered approximately 500 and ranged in age from ten to 
eighteen. The mental tests used were the following: rote 
memory, logical memory, substitution, analogues and the spot 
pattern test. 

The results show that the Chinese children were better in rote 
memory but poorer in logical memory, substitution and ana- 
logues. In the spot pattern test where no language difficulties 
existed, the Chinese were on an entirely equal footing with 
Americans. 

In the year 1918 a comparison of Chinese college students in 
China with American college students in this country was made 
by Walcott (2). The Chinese students averaged twenty-two 
years of age and were in their sophomore or junior year. 
The Stanford Revision of Binet was used, and forty-four out of 
sixty-three, i.e., seventy per cent were found to have IQ’s 
above 100. 

Afterwards 190 freshmen of Hamline University averaging 
nineteen years old were tested. These, along with the sixty- 
one adults studied by Terman, served as bases of comparison. 

It was found that the Chinese students showed equality or 
superiority in such tests as ‘President and King,” problems of 
fact, arithmetic reasoning, sense of selection and ingenuity 
tests, whereas inferiority was shown in such tests as abstract 
pairs and fables. The author thinks that the language diffi- 
culty might account for the former and difference in type of 
moralizing for the latter. 

In 1921 8. D. Lee (3) of the University of California under the 
direction of Bridgman made a comparative study of forty-six 
Chinese and forty-six American school children with ages rang- 
ing from four to fourteen, using the Goddard Revision of the 
Binet-Simon Scale. Her conclusion is that regardless of lan- 
guage handicap the final scores show the two races to be prac- 
tically equal in terms of mental age. 
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In the same year Kwok Tseung Yeung (4) of Stanford Uni- 
versity tested 109 Chinese children. They ranged in age from 
five to fourteen years and were all American born. The Stan- 
ford Revision of the Binet Scale was used, but with the omission 
of vocabulary tests and the consequent readjustment of scor- 
ing. No striking differences were found in the intelligence of 
Chinese and American children. The former were at about the 
level of Americans and North Europeans and markedly above 














TABLE 1 
Age norms of Chinese in Visual and Auditory Memory 
VISUAL MEMORY AUDITORY MEMORY 
an NUMBER 
Gr Canes Memory Point Per cent | Memory Point Per cent 
Span Score Score Span Score Score 
7 16 4.4 1.5 10.7 4.7 2.9 20.7 
8 41 5.3 3.1 22.4 4.7 3.4 24.3 
9 66 5.8 4.5 32.1 5.2 3.2 22.8 
10 65 6.3 5.2 37.1 5.5 3.9 27.8 
11 65 7.1 6.5 46.5 6.1 4.8 34.3 
12 70 7.9 8.1 57.8 6.3 5.1 46.4 
13 78 7.9 8.5 60.7 6.5 5.3 37.8 
14 73 7.8 8.2 58.6 6.6 5.4 38.6 
15 46 7.9 9.0 64.2 6.7 5.6 40.0 
16 39 8.6 9.4 67.0 6.5 5.4 38.6 
17 21 8.4 8.8 62.8 6.5 5.5 39.3 
18 13 9.0 10.0 71.5 6.1 5.2 37.1 
Others 9 
Total....| 602 


























South Europeans. The median I.Q. for the Chinese group was 
97 in comparison with 99 as found by Terman for the 905 
unselected American children studied. 

In 1924 Y. T. Hao (5) reported a study of the memory span 
of Chinese school children in San Francisco. Two sets of tests 
were used: one for visual and one for auditory presentations 


* Credit for each test passed was 2-4 months for the ages at which 
there are six tests in the scale and 3-4 months for the ages having eight 
tests. 
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(from Whipple’s Manual of Mental and Physical Tests). Each 
set contains fourteen series of digits divided into seven pairs, 
each pair having the same number of digits, ranging from four 
to ten. 

The tests were given to 602 Chinese pupils in the Commodore 
Stockton School in San Francisco. Their ages ranged from 
seven to eighteen. Seventeen classes were tested, including 
three classes from the first grade, one class from the fourth 
grade, and two classes each from the second, third, fifth, sixth, 
seventh and eighth grades. 

Each pupil was given three scores: (a) the memory-span 
score, which is the maximum number of digits correctly repro- 
duced; (6) a point score, which is the number of series correctly 





TABLE 2 : 
Means and sigmas in five tests of Chinese children 








1.Q. R.Q. | V.Q. | C.Q. | L.Q. 
iti ititicaccd 99.3 | 883 | 95.4 | 85.2 | 90.9 
TS SPREE Poe 17.5 | 141 | 4.0 | 135 | 17.8 
TEE EE 0.77 | 0.62 | 0.62 | 0.60 | 0.79 








reproduced; and (c) a percentage score, the percentage of the 
whole series correctly reproduced. The median of each of these 
three kinds of scores for each age is given in table 1. 

In the same year P. M. Symonds (6) made a study of 513 
Chinese children with ages ranging from eight to seventeen and 
grades ranging from four to eight. They were tested with the 
Pintner Non-Language Mental Tests, the Thorndike-McCall 
Reading Scale Form B, Thorndike Tests of Word Knowledge 
Form A, Kelley-Trabue Completion Exercise Alpha, Charter’s 
Diagnostic Language Test Miscellaneous A Form I. The 
results turned into IQ, R.Q., V.Q., C.Q., and L.Q. respectively 
are given in table 2. 

In 1926 Graham (7) made an investigation of Chinese chil- 
dren attending the Oriental School of San Francisco, a public 
school exclusively for Chinese. About one-fifth of the seventy- 
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three children tested were born in China but all were of Chinese 
parentage and came from Chinese speaking homes. The sub- 
jects were all twelve years old. 

The first series of individual tests was selected from Cornell’s 
Graduated Scale for determining mental age. This was fol- 
lowed by the Koh’s Block-Design test. Next a series of group 
tests was used: Mentimeter School Group 2 a, Thorndike- 
McCall Reading Scale Form 2, and National Intelligence Tests 
Scale A Forms 1 and 2. After the conclusion of these group 


TABLE 3 


Differences of Americans from Chinese on tests involving 
little or no language 














TESTS DIFFERENCES seemiaenatnin 
Cornell: 
I Objects remembered............... —0.33 0.175 
II Digits remembered................ 0.78 0.094 
IV Odd figures learned............... —0.86 0.344 
VI Problems solved................... 0.54 0.181 
Mentimeter: 
I Pictorial absurdities............... 2.36 0.539 
Serre eee 0.44 0.366 
III Geometric figures................. —1.29 0.350 
Mental ages: 
Koh’s Block-Design................... —10.5 months 
Bema Bort TG. 6... on sccinccccescces. 7.0 months 





tests the children were given another individual test, i.e., the 
Stanford Binet. 

From the findings, the author concludes that, “in pure 
memory processes of the visual type (Cornell I and IV), where 
meaning and language are kept at a minimum, the Chinese is 
fully the equal of the American, but this does not hold true of 
auditory memory. He evinces superior ability in certain types 
of concrete problem solving where the nature of the response 
may be described as a sensorimotor one; evidence for this is 
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found in his performance on Koh’s Block-Design Test, and on 
Mentimeter III. In solving other types of problems, how- 
ever, he is inferior.” Table 3 shows the differences of Ameri- 
cans from Chinese on tests involving little or no language. 


STUDIES OF THE JAPANESE 


In 1922 Fukuda (8) tested forty-three Japanese children with 
modified Binet. Their ages ranged from three to twelve. 
The mean IQ was found to be 98. 

In 1926 Darsie (9) made a study of American-born Japanese 
children. The group tested includes nearly 700 Japanese 
children who were supposed to constitute a thoroughly non- 
selected sampling of the Japanese population of California of 
the ages ten to fifteen. Excluding the strictly rural children, 
there were left more than five hundred with whom English 
was a more familiar tongue than Japanese. 

The tests used were the Stanford Revision of the Binet, the 
Army Beta and Stanford Achievement Test. In addition 
to these, teachers were asked to rate the Japanese children in 
comparison with their estimate of average American children 
of the same age or grade. Such ratings were secured on all 
school subjects and also upon nineteen general social traits. 

On the basis of the results, the author concludes, ‘‘While any 
differences in general mental capacity are almost certainly 
slight, the evidence points rather definitely to the following: 








1. Japanese children are inferior to those of American and Northern 
European parentage in mental processes involving memory and abstract 
thinking based on meanings or concepts represented by the verbal sym- 
bols of the English language. 

2. Japanese children are at least equal and possibly superior to 4 
American in mental processes involving memory and thinking based 4 
upon concrete, visually presented situations of a non-verbal character. 

3. Japanese children are superior to Americans in mental processes 
involving acuity of visual perception and recall and tenacity of at- 
tention. 


ae a ae 








* Dividing geometric figures. 
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As to teacher’s judgments, they corroborate the test findings. 
Combining the two sets of data the following conclusions are 
indicated : 


1. In reading and language Japanese children are markedly inferior 
to American. 

2. In informational subjects depending partly or largely upon read- 
ing, Japanese are slightly inferior to American children. 

3. In arithmetic and spelling the differences are negligible. 

4. In penmanship, drawing and printing, Japanese children are 
superior to American. 

5. Japanese children impress teachers as being less self-confident, 
freer from vanity, and more sensitive to approval than American chil- 
dren. They are also rated as more stable emotionally and more re- 
sponsive to beauty than Americans. In originality and general intelli- 
gence, American children are judged superior. With regard to the more 
definitely moral-social traits, such as sympathy, generosity, con- 
scientiousness, truthfulness and school application and deportment, no 
significant differences appear. 


In this connection it may be also mentioned that Kubo (10) 
has standardized a Japanese translation of the Binet Scale, 
including an extension embodying certain elements from the 
Otis as well as the Army Alpha and Beta Tests. The stand- 
ardization is reported as having been made upon 1200 Tokyo 
children, of ages two to fifteen. The scale thus standardized 
was given to 536 non-selected Tokyo children. Their ages 
ranged from six to nine. From the distribution of IQ’s as 
reported, we have found the mean IQ to be 98.2, the median 
IQ to be 98.7 and the 8.D. to be 9. 


STUDIES INCLUDING BOTH CHINESE AND JAPANESE 


During the winter of 1924 and 1925 Sandiford (11) studied 
a number of Oriental immigrants in Vancouver. The Pintner- 
Paterson Scale of Performance Tests was used. Among the 
subjects 224 were Chinese and 276 were Japanese. The 
method of scoring first used was that of the year Scale. This 
gave a mental age from which the IQ’s were calculated. 

The median IQ of Japanese males was 115.4; of Japanese 
females 112.8; of both together 114.2. The median IQ of 
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TABLE 4 


Means and sigmas of each race group in social status and 
intelligence measures 






































wou. | amen | aSaG |neruuioEsce | anwr aera 
BER . 
M o|M ¢ M oc M co 
Anglo-Saxon..........| 57 |6.45/1.15/3.84/0. 94/229 81/61 28/66 .43/14.08 
GUD o'dowe deena 58 |5.72/1.22/3.15)0.84/166 47/54. 12/56. 11)17.74 
PE 6 ca stscwweds 61 |(5.60)1.00/2.68/0 .93)159 42/41 53/61 .41/16.22 
TABLE 5 


Measures of moral traits. Percentage of each race group which overlaps 
the Anglo-Saxon median (corrected sigmas are used) 























roe od CHINESE —" rs 

“Honesty” Test............0ceseeeeeee- 50 |87 |99.9 | 165 
OPE SA TCR Oe PO 50 | 90 50 58 
Chassell-Upton Citizenship Scale....... 50 | 79 66 89 ‘ 
Ambition: , 

Teacher’s Estimate..................: 50 | 70 62 31 ‘ 

ns. oan cas awmaeses +e 50 | 34 Ot 99 
Perseverance: 

Teacher’s Estimate................... 50 | 63 50 31 

ES nt eee eee 50 | 93 92 98 
Trustworthiness: 

Tenoher’s Hatimate........0.cccccces. 50 | 63 50 31 

na wc cide > ut ae ces ou aoe « 50 | 78 25 96 
Self-assertion: 

Teacher’s Estimate...................| 50 | 29 27 31 

RT ee oe ee 50 0.3 | 20 92 
Sensitiveness to Public Opinion: 

Teanher’s Hatimate........ccccccccee. 50 | 68 71 31 

SE ee 50 | 69 87 96 
Control of Emotions: 

Teacher's Matimate. ........cccccceces 50 | 65 99.6 31 

CE Pee Her ee 50 | 97 78 97 
PRORNR BIO. oa cccc asics orapocdas 50 | 75.17 | 69.17 











F 
! 

























MENTALITY OF CHINESE AND JAPANESE 17 


Chinese males was 107.77; of Chinese females 107.0; of both 
together 107.4. The showing of the Chinese is lower than that 
of the Japanese but better than that of white children. More 
than 71 per cent of the Chinese reach or exceed the median 
score of the whites. 

The author points out that this superiority is probably due to 
selection. First, it is the Japanese and Chinese possessing the 
qualities of cleverness, resourcefulness and courage who have 


TABLE 6 
Modified Binet results—651 cases 
































AVER- | AVER- A 
AGE RACE nx isenes . | a —y my 8.D. | P.E. 
CASES — LQ. AGE LQ. 

Japanese | Boys | 42 8.61) 91 8.71) 93 1.06/0 .078 

9 Chinese Boys | 35 8 .67| 82 8.95} 95 1.07|0 .086 
Japanese | Girls | 35 8.88] 94 9.0 | 94 0.84/0 .067 

\| Chinese Girls | 36 9.37) 99 9.21) 97 1.11/0.088 
Japanese | Boys | 37 | 10.42/ 84 10.11). 83 1.06/0 .083 
12 | Chinese Boys | 37 | 10.64) 85.5 | 10.35) 85 1. 23/0 .096 
Japanese | Girls | 39 | 10.07! 81 10.25) 82.5 | 1.21/0.092 
Chinese Girls | 36 | 10.07} 81 9.84) 80 1.11/0.088 
Japanese | Boys | 39 | 11.25) 80 11.1 | 80 1.18)0.090 

14 Chinese Boys 35 | 11.78) 84 11.92) 85.5 | 1.47/0.118 
Japanese | Girls | 37 | 10.93) 78 10.89} 78 1.0910 .085 
Chinese Girls 33 11.44) 82 11.31] 81 1.33)0.110 




















emigrated to British Columbia. Second, the relatively more 
intelligent Chinese and Japanese children are sent to school in 
igher proportion than obtains among the whites. 

In 1925 Murdoch (12) conducted a psychological investiga- 
tion in Honolulu, the purpose of which was to determine the 
main differences between the most commonly represented races 
there. Eleven racial and mixed-racial groups were tested in 
National Intelligence and Army Beta, and estimates of general 
intelligence in the different races made by the supposedly most 
intelligent and best-informed residents of the city were also 
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secured. But as this review is limited to the studies of the 
Chinese and Japanese, only the results of these two peoples will 
be presented in tables 4 and 5 in comparison with the Anglo- 
Saxon. 

In the same year Porteus and Babcock (13) made an extensive 
investigation of the various racial groups in Hawaii. All the 
examinations were carried out in the public schools, the great 
majority of the cases being in attendance in city schools. In 
all eight schools were taken, four of them being city schools and 
four large rural schools. A modified Binet Scale was applied, 
the plan being to examine all the nine, twelve and fourteen-year 
children available in each school. According to the authors, 
the children were unselected except on the basis of age, and were 
representative of their racial groups in Hawaii. 

Previously to undertaking this research three hundred cases 
had been examined by the staff of the University Psychological 
Clinic. On the basis of these results certain modifications in the 
Binet were made. These modifications consisted in eliminating 
the tests of the Stanford Revision which their experiments 
showed were most affected by language disabilities. Those not 
used in the scoring included the vocabulary test for each year, 
definition of abstract words (Years XII and XVI) changing 
clock hands (XIV). For this and other reasons, other omis- 
sions and changes were made. 

Here only the findings concerning the Chinese and Japanese 
will be reported. They are given in table 6. 

These results show that, as far as learning capacity as 
measured by the Binet Test is concerned, the advantage lies 
with the Chinese. Suspecting the existence of other differences 
in the test reactions which the statement of results in general 
in terms of averages or medians tends to conceal, the authors 
divided the tests into two categories, those which are more 
directly tests of native capacity and those which are affected 
most by school training or differences in life experience. To 
the former belong tests 3 and 4 of year VII, test 2 of year [X, 
tests 3, 4, 6 and alt. 1 of year X, tests 3 and alt. 1 of year XII, 
test 2 of year XIV and tests 4, 5 and 6 of year XVI; whereas 
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to the latter are assigned test 2 of year VII, tests 2, 5 alt. 1 and 
alt. 2 of year VIII, tests 3, 4and 5 of year IX, tests 4 and 5 of 
year XII, tests 3, 4 and 5 of year XIV and test 2 of year XVI. 

In tests dependent principally on native ability the Chinese 
boys excelled the Japanese by only a few points. The scores 
were: Chinese 745.5 and Japanese 737. As regards the tests 
affected more directly by environment, the Chinese were more 


TABLE 7 
Diagnostic score from Binet Test Vineland Revision 











YEARS 
ic i |... RET OPES ERT ET UROL CEE EERE EEE EET OT 5 
Pe cistron cpcdccchacctavevesesavess 6 
bee RE one I er 7 
RE DR ediuw vives be teeN rele oi dereeedewswe s 
Rita di ckth sackehedndbentte ene seines 9 
X—3 Repeat 4 digits backwards..................... 10 
Pel LY eee ee T PETE EeT PTT TL ELE Tee 11 
as dt6dab a x0 chaconnannapcenetssdendod 12 
Se 82 Ue ou op ac aN op bdeernsacanesessviens 12 
XII—5 Interpretation of Pictures..................... 13 
XII—AIt. 1 Repeat 5 digits backwards................ 14 
pe ere Serer 15 
XIV—5 Mental arithmetic...................ceeeeeeees 15 
ee EY Io ii6nes e6ensccneseesevenkes 16 
ee RT on cisds 0c aco ad Wheehoe css Fecesedaees 17 
BU POO TL Cuties secs seca seus oetcbwde cebeeds odes 17 
ee, s,s cid kink od gdbaWabied abil ddbacisewek 18 
NE Es SS 2s ee re 18 





superior. The total scores were Chinese 848, and Japanese 
815. 

A further analysis of these results was undertaken to see 
what effect age and degree of maturity had on comparative 
scores, especially with regard to the children who passed tests 
at the higher levels of attainment. Accordingly the number of 
tests passed at twelve years or above was calculated and taken 
as an index of the amount of superior ability in each group. 
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Here again the Chinese lead; for their score is 105, while that 
of the Japanese is 73. 

Considering the tests of native ability in the same way, the 
same racial tendencies are observable, but to a more marked 


TABLE 8 
Form and assembling test results 





NUMBER NUMBER 
J ES 
OF CASES = OF CASES ee 





Chinese 118 102 158 95 
Japanese 207 101 131 101.6 
Americans 724 100.8 795 96.8 

















Porteus maze results 





NUMBER OF 


eanne AVERAGE IQ 





Chinese boys 200 95.3 
Chinese girls 88.9 
Japanese boys 208 106.9 
Japanese girls 96.8 
American boys 99.2 











TABLE 10 
Intelligence Quotients on Goodenough Intelligence Test 





COEFF. OF 


MEDIAN MEAN 8.D. van. 





American 100.3 101.5 18.3 18.0 
Chinese 103.1 104.1 18.0 17.2 
Japanese 99.5 101.9 18.0 17.7 

















degree. Taking the records of the twelve and fourteen year 
children together, it is found that the Chinese again occupy the 
first place with a score of 98.5, while the Japanese have a score 
of 93.5. 


Another analysis was made by means of diagnostic scores, 
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which are given in table 7. According to these results, the 
Japanese at nine years had an average IQ of 92.5, at twelve 
years 93, and at fourteen years 93. The Chinese boys’ average 
IQ’s at these three periods were 92, 95.5 and 94. This again 
shows that the Chinese are superior to the Japanese. 

In order to obtain racial comparisons from a different angle, 
the Porteus Form and Assembling Test was employed. This 
test combines the features of a form board and a logical rela- 
tions test. The results expressed as an intelligence quotient 
(relation of score to age) are presented in table 8. 

In determining temperamenal differences the Porteus Maze 
test was used. The children examined by this test were the 
same groups to whom the Modified Binet had been given. These 
results are given in table 9. They seem to indicate that in the 
qualities hereby measured the Japanese are superior to the 
Chinese. 

Racial traits were also determined by means of rating scales. 
The Japanese were found to be superior in such traits as resolu- 
tion and planning capacity, while the Chinese in such traits as 
tact, dependability and self-determination. 

In 1926 Goodenough (14) tested 2,457 public school children, 
practically all of whom were American born, but in whose 
immediate ancestry a number of racial stocks were represented. 
The test used was the Goodenough Intelligence Test for young 
children. This test is based upon drawings of the human 
figure, and is completely independent of language. The 
findings are shown in table 10. 

The author concludes: 


There is no reason for thinking these children to be other than fairly 
representative of their several racial groups as found in this country, 
except in the case of American children. In order to be absolutely 
fair to the foreigners, it was decided not to include any schools from 
superior residential districts in these distributions. Since the test used 
is entirely non-verbal, these differences cannot be explained on the 
basis of a linguistic handicap. 
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SUMMARY AND CRITICISM 


By way of summary, important facts about these studies 
are given in tables 11, 12,13 and 14. In table 11 a few of the 
dates pertain to the publication of the articles, for the time 
of investigation cannot be determined from them. However, 
if these dates are at all reliable, they seem to indicate a growing 
interest in such racial ‘studies. Along with this growth of 
interest there isa definite tendency to place less and less premium 
upon the linguistic elements of tests. These tendencies may 
certainly be looked upon as hopeful signs in racial psychology. 
However, in this table, certain defects of the studies are clearly 
apparent. One will notice that the social status of the subjects 
has seldom been considered; and, in some cases, even age data 
have been omitted. 

Table 12 gives the means and sigmas of the Chinese, the 
Japanese and the American groups. Strange as it may seem it 
is nevertheless true that in some of the studies means are not 
given, and in most of them sigmas are not computed. Therefore 
most of them have to be computed from whatever data are 
available. Here I may also point out the fact that the uni- 
formity of the age or grade range of the groups compared has 
failed to receive due attention in some of these investigations. 
This statistical weakness has to be kept in mind in evaluating 
the results. 

Considering this table as it stands, we may observe the follow- 
ing facts: In Goddard and Stanford revisions of Binet, Porteus 
Form and Assembling Test, Pintner-Paterson Performance and 
Goodenough Intelligence Test, the Chinese and the Americans 
are about equal. The latter however, are superior in National 
Intelligence, Army Beta, Porteus Maze, Cornell and Menti- 
meter; whereas the former are superior in Block Design. In 
this connection it might be also mentioned that according to 
both Pyle and Hao the Chinese are better in rote memory and 
equal to Americans in spot pattern. As to such tests as 
logical memory, substitution analogies, the Americans are 
better. For the latter data, readers may go to the original 
tables of these authors. 
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TABLE 13 
Reliabilities of differences between the races compared 








RACES D As 
TEST companne Cay. DIFF. D ——_ < % 
Goddard Binet Chinese 2.43 | 2.86 ; —3.1*| 1.08 | 85.6 


American | 1.51 


Stanford Binet Chinese 1.29 | 1.37 | —3.8 | 2.77 | 99.7 
American | 0.48 
Stanford Binet Japanese 66 


0. 0.89 | —8.1 | 9.10 {100 
American | 0.61 


Army Beta Chinese 2.32 | 2.97 |—10.3 | 3.46 {100 
American | 1.86 

Army Beta Japanese | 2.07 | 2.79 | —5 1.79 | 96 
American | 1.86 

Army Beta Chinese 2.32 | 3.10 | —5.3 | 1.70 | 96 


Japanese | 2.07 


Army Beta (Darsie) Japanese | 0.70 | 0.84 | +5.4 | 6.42 |100 
American | 0.46 


Goodenough Intelligence} Chinese 3.60 | 3.69 | +2.6 | 0.70 | 76 
Test American | 0.82 


Goodenough Intelligence} Japanese | 2.77 | 2.89 | +0.4 | 0.14 | 55.6 
Test American | 0.82 


Goodenough Intelligence| Chinese 3.60 | 4.54 | +2.2 | 0.48 | 68.2 
Test Japanese | 2.77 


National Intelligence Chinese 7.10 |10.77 |—63.3 | 5.87 |100 
American | 8.11 


National Intelligence Japanese | 5.42 | 9.69 |—70.4 | 7.26 |100 
American | 8.11 























* Plus and minus signs indicate the difference of the first from the 
second of each pair. 
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TABLE 13—Continued 








RACES D #8 

TEST COMPARED “av. | DIFP. D DIFF. 45 
National Intelligence Chinese 7.10 

Japanese | 5.32 | 8.87 | +7.1 | 0.80 | 79 

Pintner-Paterson Chinese 1.08 | 1.45 | —5.7 | 3.93 {100 
Japanese | 0.98 

Cornell Chinese 0.36 | 0.54 | —2.5 | 4.62 |100 
American | 0.41 

Mentimeter Chinese 2.39 | 3.13 |—35.9 {11.46 |100 
American | 2.03 

National A-1 Chinese 3.24 | 4.72 |—47.7 |10.10 {100 
American | 3.43 

National A-2 Chinese 3.26 | 3.43 |—56.3 |16.41 |100 
American | 1.06 























As to the Japanese, their mean IQ in Stanford Binet is 
lower than that of the American; although, in both Modified 
and Japanese Binet, approximate equality exists. In National 
Intelligence the Japanese are inferior; while in Porteus Form and 
Assembly Test, Porteus Maze and Goodenough Intelligence 
Test, they compare favorably with Americans. 

Comparing the Japanese with the Chinese we find that the 
former excel in Porteus Form and Assembly Test, Porteus 
Maze, Army Beta and Pintner-Paterson; whereas, in Porteus 
Modified Binet and National Intelligence, the latter are 
superior. 

The reliabilities of these differences are given in table 13. 

Now let us turn to the aspect of variability, which is shown 
in table 14. Here it may be noticed that in the tests in which 
the Chinese approximate or surpass the Americans, their vari- 
ability is also about equal to that of the latter; whereas, in 
the tests in which they show inferiority their variability is also 
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greater. The former fact needs no further explanation. As to 
the latter fact, it seems to be worth some analysis. Porteus 
has pointed out an important fact, namely that the language 
factor weighs more heavily in National Intelligence than in 
Binet. This accounts at least in part for the greater variability 
of the Chinese in the former test as well as their inferiority, for 
there are undoubtedly differences in the extent to which English 
is used in different homes. To the factor of linguistic handicap 
is often added that of peculiarity in ways of thinking, resulting 
from differences in social and historical background. It may be 
safely assumed that the parents of the Chinese children tested 
had been in America for unequal lengths of time. This differ- 
ence had undoubtedly functioned differentially upon the mental 
constitution of the subjects. Moreover, in the case of Graham’s 
group, we are told that one-fifth of the 73 children tested were 
born in China, and that the number of children tested in Cor- 
nell, Binet, Block Design, Mentimeter, National A-1 and A-2 
was 63, 62, 62, 56, 57 and 59 respectively. How the children 
from China were distributed among these we do not know, but 
this much we can say. Since they were not excluded, their 
scores must have exercised a more serious effect upon the various 
groups mentioned than the total of 73. These hypotheses, 
however, do not explain the greater variability and inferiority 
of the Chinese in the Army Beta. The results of this test are 
evidently capable of various interpretations, but it is too unsafe 
to attempt any in view of the meagerness of the present material. 

With regard to the Japanese, both their inferiority and vari- 
ability are also explainable in terms of the above hypotheses. 
However, it should be understood that whether or not such 
racial differences can be thus entirely explained away is a 
question which remains to be solved only by more carefully 
controlled investigation. 

In conclusion, the writer wishes to exphasize the importance, 
in future studies of this kind, of remedying the defects that are 
found in some of the investigations here discussed, namely, the 
neglect of the social status of subjects, the variation in age or 
grade range of the groups compared, the failure to adapt ma- 
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terial to racial differences in ways of thinking and the failure to 
consider the aspect of variability. In this connection atten- 
tion may be called to the fact that in order to be able to make 
racial studies thorough and comprehensive we have to direct 
our efforts first of all to the twofold task: namely, to improve 
upon present tests of adult intelligence, and to devise valid 
methods of measuring all phases of human behavior. Until 
such a task is accomplished it is unsafe to make generalizations 
about racial differences. 


REFERENCES 


(1) Pyitz, W. H.: A study of the mental and physical characteristics 
of the Chinese. School and Soc., 1918, 8, 264-269. 

(2) Watcort, G. D.: The intelligence of Chinese students.- School and 
Soc., 1920, 11, 474-480. 

(3) Lez, 8. D.: A comparative study of normal Chinese and American 
children. University of California Thesis M.A. 1921, (un- 
published). 

(4) Yeuna, K. T.: Intelligence of Chinese children in San Francisco 
and vicinity. Jour. Applied Psychology, 1921, v, 267-274. 

(5) Hao, Y. T.: The memory span of 600 Chinese school children in 
San Francisco. School and Soc., 1924, 20, 507-510. 

(6) Symonps, P. M.: The intelligence of the Chinese in Hawaii. 
School and Soce., 1924, 19, 442. 

(7) Granam, V. T.: The intelligence of Chinese children in San 
Francisco. Jour. Comp. Psychol., 1926, 6, 43-72. 

(8) Fuxupa, T.: Some data on the intelligence of Japanese children. 
Amer. Jour. Psychol., 1923, 34, 579-601. 

(9) Darstz, M. L.: The mental capacity of American-born Japanese 
children. Comp. Psychol. Monog., 1926, 3 (No. 15), 1-89. 

(10) Kuso, Y.: The revised and extended Binet-Simon Tests applied 
to the Japanese children. Ped. Sem., 1922, 29, 187-194. 

(11) Sanpirorp, P., anp Kerr, R.: Intelligence of Chinese and Japa- 
nese children. Jour. Educ. Psychol., 1926, 17, 361-367. 

(12) Murpocn, K.: A study of differences found between races in in- 
tellect and morality. School and Soc., 1925, 22, 628-632 and 
659-664. 

(13) Porrevs, 8. D., anp Bascockx, M. E.: Temperament and Race, 
1926. 

(14) Gooprnovan, F. L.: Racial differences in the intelligence of school 
children. Jour. Exper. Psychol., 1926, 9, 388-397. 











THE PEARSON FORMULA, AND A FURTHER NOTE 
ON THE KUHLMANN-ANDERSON TESTS 


F. KUHLMANN 


Director, Division of Research, Minnesota State Department of Public 
Institutions 


The writer is not a statistician, and what is said here on the 
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tions is not an attempt to give the reader anything new. The 
short-comings of the Pearson formula have been stated many 
times. Some devices for making corrections are at hand. But 
most users and writers have proceeded in more or less complete 
disregard of this fact, and as though this were a perfect, if 
not the only means of determining correlations. The mental 
test literature, in particular, is replete with Pearson r’s un- 
accompanied by anything like an adequate statement of the 
nature of the material that is essential for proper evaluation. 
Apparently the majority of school men and others with a 
limited knowledge of statistical procedures have come to accept 
Pearson “coefficients of validity,” and of “reliability” at their 
face value, without considering any further details as to how 
they were obtained. Yet anyone who attempted to determine 
the absolute or even relative merits of any mental tests so far 
published from Pearson r’s that have been obtained would have 
a difficult task indeed. It seems not to be difficult to get re- 
liability coefficients for any battery of tests that range from the 
fifties to the high nineties, or, in other words, from an in- 
dication that the tests are very poor to an indication that they 
are near perfection in reliability. In addition to this, it seems 
to be always assumed that the higher the correlation the better 
the tests are thereby shown to be. It is obvious that a correla- 
tion may be high because the same factors cause similar errors 
32 

















PEARSON FORMULA, AND KUHLMANN-ANDERSON TESTS 33 


in both series of measures between which the correlation is com- 
puted. In the writer’s opinion this is just what has occurred 
in a great many of the studies that report high correlations on 
given tests. 

The fundamental requirement of any device for determining 
the amount of relationship between two series of measures 
should be that each case contribute to the correlation found by 
the device in proportion to the amount of agreement between 
the two measures on each case. It is on this requirement that 
the Pearson formula fails completely. The two faults about 
which we have heard most are first, that the coefficient of 
correlation, 7, is affected by the amount of variability in the 
measures from case to case. r increases with the standard de- 
viation. Second, it is affected by the nature of the distribution 
of the measures. This distribution must be “normal” or a dis- 
torted r will result. The two are not independent. Analyzed 
further, it means that the formula makes two types of errors 
which cancel each other only when the distribution of measures 
is normal, which it never is entirely, and very often very far 
from it. It means that the two measures on a given case will 
affect r in proportion as the measures are above or below the 
averages, rather than in proportion to the amount of agreement 
between them. And inasmuch as the larger deviations from 
the average are relatively infrequent, it is this relatively small 
number of cases that plays the larger part in determining the 
magnitude of r. Again, whenever a case measures below aver- 
age in one series and above average in the other, so that d,d, 
is negative, it will decrease r, although the actual agreement 
between the two measures on this case may be close, and much 
above the average agreement for the given data. In short, the 
actual working of the Pearson formula reveals a number of 
ways in which it fails to permit each case to contribute to the 
coefficient of correlation in proportion to the agreement between 
the two measures on it. 

The following illustration shows some of these ways. One 
hundred thirty-six pupils in grade V were given the Detroit 
Alpha group intelligence tests, and also the Kuhlmann-Ander- 
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son tests! The Pearson r was computed for the mental ages 
in months obtained by the two tests batteries, for the 136 cases. 
Four. other r’s were then computed as follows. (A) Twenty 
cases were eliminated from the 136, whose mental ages were 
highest. For these d;d, was +299 and over. (B) Return- 
ing to the original series of 136, twenty were next eliminated 
whose mental ages were least above average. For these dd, 
was 0 to +30. (C) Fifteen cases were eliminated whose 
mental ages were nearest average, but where for each case the 
mental age as obtained on one test battery was above average, 
while the mental age obtained on the other test battery was 


TABLE 1 





AV. DIF. 


“1 IN M. A. 





0.794 18.52 10.18 
0.682 15.59 9.52 
0.912 20 .03 ‘ 10.77 
0.803 19.70 , 10.52 
0.866 19.37 9.21 




















below average. For these d\d, was —1 to —24. (D) Fifteen 
cases were eliminated from the original 136, where d,d,; was 
—25 and over. Table 1 gives the effect on r of these elimina- 
tions, together with other figures. O is the original series of 
136 pupils. 

Under o; are given the standard deviations for the Detroit 
tests, and under oz the same for the Kuhlmann-Anderson tests. 
The difference between the two mental ages on each case for 
the two test batteries was taken as a direct measure of the 
amount of agreement, and the average of these differences is 
given in the second column from the right. D gives another 
measure of this agreement, which is explained below. So long 


1 Intelligence Tests for Ages Six to Maturity. By F. Kuhlmann and 
Rose Anderson. Educational Test Bureau, Minneapolis, Minn., 1927. 

The Kuhlmann-Anderson Intelligence Tests Compared with Seven 
Others. By F. Kuhlmann. Jour. Appl. Psychol., December, 1928. 
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as we limit ourselves to this particular data the average differ- 
ence in mental ages presents no serious fault as a measure of 
agreement. Compare now the changes in r with the actual 
changes in agreement as certain cases are eliminated. These 
eliminations present nothing more than departures from the 
normal distribution of the measures, and these departures are 
no greater than are regularly met in actual practice. In ‘‘A” 
the agreement between the mental ages for the two test bat- 
teries is better than in “O.”’ The average difference in mental 
age drops from 10.18 months to 9.52. But r indicates a 
poorer agreement. In “B” the actual agreement for the re- 
maining cases is poorer than in “O,” but r indicates a better 
agreement. In “C” the actual agreement is slightly poorer, 
but r indicates a slightly better agreement. In “D” alone 
does r change in the right direction, indicating a betteragree- 
ment when it is better. 

The average difference between measures as an index of 
amount of agreement between the two series of measures has, 
of course, a limited application. The measures in the two 
series must be the same in kind, and the magnitude of the 
averages must be the same if comparisons are to be made from 
one set of data to another. If the measures are in mental ages, 
or raw scores, for example, comparisons between data from two 
different mental levels could not be made. But if IQ’s were 
substituted for mental ages the average difference in IQ’s 
between two series of measures by two test batteries would be 
a far better index of agreement than the Pearson r at any 
mental level. 

A method of more general applicability than this divides each 
measure in each series by the average of the series, takes the 
difference between these two quotients for each case, adds these 
differences and divides by the number of cases. The resulting 
figure gives the average per cent of disagreement between the 
measures. The formula in detail is, 


Zi fm, =) 4 Ma ms , 
‘ ao hoe” Tie ho 
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where m, is a measure in the first series, m» is a measure in the 
second series, Av, and Av, are the two averages for series a 
and b, 1, 2, etc. are the consecutive cases, and n is the number 
of cases. Subtract the second term in the parenthesis from the 
first algebraically, but disregard resulting signs when adding. 
D is taken as the index of agreement. Since it is larger the 
less the agreement, we may call it a measure of disagreement. 
The figures under D in table 1 were obtained, with this pro- 
cedure. In this method the two measures in the two series 
must also be the same in kind, unless they give the same 
variability and distribution. D, however, is independent of the 
magnitude of the average, and the method applies equally at 
all mental levels. It is affected by differences in variability 
in the two series of measures, being larger when the range is 
shorter in one series than in the other. Whether this is a fault 
or a desirable trait depends on just what we wish to know. 
If we wish to know how well two test batteries, for example, 
agree in merely ranking the cases in the same way then it is a 
fault. But if we wish to know how well the measures them- 
selves in the two series agree in their absolute values then it is 
a desirable trait. So long as the two measures are the same 
in kind, both mental ages, or both IQ’s, for example, no change 
in formula so as to take account of a difference in variability in 
the two measures seems to be called for. 

A slight change in this last procedure so as to take account of 
a difference in variability makes it as universal in applica- 
bility as the Pearson formula. This substitutes the deviation 
of each measure from the average of its series for the measure 
itself, and divides by some measure of deviation, such as the 
A.D., 8.D., or P.E. After the averages, the deviations and 
the measures of the deviations have been computed the formula 
then is: 


Tes) -G 8) = 
- a See Ran” aah 
1 = 


n 











This formula will not show the disagreement between two series 
of measures that is due to differences in variability and when 
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the two measures are the same in kind is not as desirable as 
the preceding. 

These several methods and others have all been suggested 
before, but discarded in favor of the Pearson formula. No 
doubt objections to the three noted here may be raised. But 
he first two have the basic merit of permitting each case to 
contribute to the index or coefficient of correlation in proportion 
to the amount of agreement between the two measures in each 
case. And their faults are shared largely by the Pearson 
formula. The latter aims at universal applicability and 
achieves universal inapplicability. 

In the study referred to above in which the Kuhlmann- 
Anderson intelligence tests were compared with seven other 
test batteries in common use a number of Pearson r’s were com- 
puted. They contradicted other evidence in so many instances 
and so markedly that some of the results were gone over again, 
this time using the second method named above to compute 
agreements between the different series of measures. The D’s 
in the following tables express the average difference in the 
quotients obtained by dividing each mental age on a case by 
the average mental age for the series. The data will afford a 
comparison between this D and the Pearson r, corroborate some 
of the conclusions reached in the previous paper, and give some 
new results that could not be obtained by the Pearson formula. 
It should be remembered that D gives a measure of disagreement 
and is small where the correlation is close. Table 2 repeats 
the Pearson r’s and o’s of the previous study, and adds the 
corresponding D’s. 

A study of the figures in this table reveals (1) that r is so 
dependent on variability as given by o; and a that little or 
no legitimate comparisons can be made from r as to the rela- 
tive amount of correlation between the K.-A. and other tests 
or as to changes in this correlation from grade to grade. (2) 
There are numerous and marked disagreements between r and 
D. Two instances are of special interest. The average r for 
grades V to VIII for the National and K.-A. tests is 0.705, 
for the Detroit Alpha and K.-A. it is 0.743. But the average 
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D is 5.57 for the former and 6.05 for the latter. For the Otis 
and K.-A. tests for grades III and IV the average r is 0.575. 
For the Detroit Primary and K.-A. the average r is 0.675. 
But the respective D’s are 6.25 and 7.48. In other words, 
according to r both the Detroit Primary and Detroit Alpha 
agree more closely with the K.-A. than do the National and 
Otis. In both instances the facts are the opposite. From 








TABLE 3 
r IX- 
I II Ill IV Vv VI VII | VIII XI 
Pintner-Cunning- | |29.63)/13.39 
ham and K.-A. 11.11) 9.46 
16.94/12 .31 
Otis Pr. and K.-A. 9 29] 6.78 
yy Pr. and 33.91/23 . 12/16. 16 
K.- 9.18] 7.79) 6.53 
25.70/20 .30)17 .74/13 .95 
Haman: eae a “Ad 5.98| 8.07| 7.90] 8.72 
Detoit Alpha and 23.34/17 .36/11 .94)10 92 
K.-A. 7.38) 8.28) 7.55) 7.20 
13.14 
Terman and K.-A. { 7 34 






































other evidence given in the previous study it was concluded 
that the Detroit tests were relatively poor although giving the 
higher correlations with the K.-A. tests, these higher correla- 
tions being due to the higher o’s, and the higher o’s being due 
to the large number of trials that made up the Detroit tests. 
D here corroborates this conclusion so far as a poorer real 
agreement with the K.-A. tests goes. 

An attempt was made in the previous study to determine the 
relative merits of the different test batteries used by computing 
the Pearson r between the median mental age on the odd num- 











40 F. KUHLMANN 


bered tests of the K.-A. battery and the median mental age on 
the even numbered tests of the battery, doing the same for 
each of the other tests but substituting the total raw scores on 
the half scales. But the variability was larger, of course, for 
the half than for the whole scale because of the smaller number 
of tests, and small for the K.-A. tests because of the median 
mental age method of scoring and the r’s changed with the 
variability. This part of the study was not completed, as it 
obviously would not have solved the problem set.2 On re- 
turning to the data again, the D’s were this time computed 
for the half scales, using even and odd numbered tests, for the 
two halves, and keeping an unequal number of tests in the two 
halves where the total number of tests in the whole scale was 
odd. These D’s are given in table 3. 

These figures reveal two very striking differences between the 
K.-A. tests and all others. First, the D’s are from nearly two 
to five times as large for the other tests as they are for the 
K.-A. There is no exception to this in the seventeen com- 
parisons the table gives. Second, the D’s on the whole de- 
crease only slightly from lower to the higher grades for the 
K.-A. tests. For all other tests they decrease very markedly. 
Both facts are brought out more strikingly in the following 
graphs. The figures on the right give the magnitude of D, 
and the school grades are indicated at the bottom. The letters 
at the left ends of the curves identify the tests. The number 
at the right ends of the curves identify the groups of children 
on whom the tests were made. For example, the curves marked 
P.-C.-1 and K.-A.-1 represent results from the same group of 
children. 

The differences shown here are due to at least three factors. 


2 For example, dividing the Terman battery into halves with the 
even numbered tests in one half and the odd numbered tests in the other, 
gave a Pearson r between the raw score totals on the two halves of 
0.76. Doing the same with the K.-A. battery for results from the same 
cases, but using the median mental age method of scoring for the two 
halves, gave a Pearson r of 0.72. But in table 3 the D for the same 
data is 13.14 for the Terman tests and 7.34 for the K.-A. 
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(1) The individual tests themselves that make up each battery. 
(2) The number of tests in the battery. (3) The methods of 
scoring the individual subject, median mental age method for 
the K.-A. and total raw score for the other tests. The general 
better agreement between the half scales for the K.-A. tests 
is probably largely due to the method of scoring. If no dis- 
turbing factors entered we might have determined this by com- 
paring the K.-A. tests with the others when in both the indi- 
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vidual is scored by the total raw score, and when some of the 
tests are dropped from the K.-A. battery so as to make the 
number in the battery the same in the K.-A. as in the others. 
But two disturbing factors appear when this is done. One is 
the quite unequal number of trials in the different tests in a 
K.-A. battery. The median mental age method weights each 
trial, and equality of number of trials from test to test is no 
advantage. The second is the fact that the tests in the K.-A. 
battery are purposely made of unequal general difficulty, being 
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arranged from relatively easy to relatively difficult from first 
to last. The point where this comparison can be made with 
least objection is between the last K.-A. battery and the 
Terman tests. Reducing the K.-A. battery to the same number 
of tests as in the Terman battery by dropping out the first and 
last, and using the total raw score method in scoring each child 
for the K.-A. results as for the Terman, raised D between the 
two halves of the K.-A. battery from 7.37 to 11.12. The D 
for the Terman tests is 13.14. That is, dropping two tests, 
and the median mental age method of scoring for the K.-A. 
tests raised the average disagreement between half scales 3.75 
points, which is 51 per cent. 

The marked decrease in D for all but the K.-A. tests in 
going from lower to higher grades is of special interest. The 
agreement between results on the two halves of each battery 
is strikingly better for the higher grades. One would naturally 
infer from this alone that the test battery as a whole works 
better for the higher grades. But this would contradict much 
of the other data. Besides, this better agreement for the higher 
grades can be explained in the same way’as was some of the 
other data in my previous paper. These test results show little 
or no better agreement with the K.-A. test results for higher 
grades than for the lower. See table 2. The number of zero 
and maximum scores on individual tests in the battery that 
these tests give at different grades do not indicate that their 
applicability would always improve so markedly in going from 
lower to higher grades. And the middle points in the age 
norms for these tests lead one to expect best applicability at 
different grades for the different batteries. With this the 
present result agrees only in the case of the Detroit Alpha. The 
age norms for these alone indicate that they might work best 
in grade VIII. According to these criteria the National should 
fit best between grades V and VI, above which they should 
give poorer results. But on the matter of an increasing agree- 
ment between half scales from lower to higher grades the 
National and Detroit Alpha show almost identical results. 
This increasing agreement therefore is not an expression of 
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better performance in the higher grades of the test battery as 
a whole. 

Its correct explanation is probably the same as I gave in 
my previous: paper of the fact that the raw scores on each 
individual test in these batteries decreased in variability from 
case to case in passing from lower to higher grades, while the 
variability of the mental ages increased from lower to higher. 
The raw scores decreased in variability, it was held, because 
children will adjust their effort to the difficulty of the test. A 
test difficult enough to require maximum effort to pass only a 
few of its many trials will call out maximum effort. One that 
is easy enough to require only a little effort to pass most or 
many of its trials in the time allowed will call out a corre- 
spondingly lower degree of effort. Relatively difficult trials, 
but with a longer time allowed on the test, so that the child 
will pass a considerable number of them, may have the same 
effect on lowering the child’s effort as will relatively easy trials 
in a test and shorter time allowance. Besides this, a limit is 
somewhere reached in every test where the mechanical aspects 
of the tests, motor processes and minimum association time 
for example, make more trials passed in the alloted time more 
or less impossible. These two factors are met in increasing 
degree with any given battery of tests as the tests are applied 
to higher mental levels. The first effect is a reduced dis- 
crimination between bright and dull children, expressed in 
lowered variability in raw scores. 

But this same factor will also make the raw scores on the two 
halves of the scale agree more closely as they are applied to 
higher grades. When tests are relatively difficult they call 
into play different mental processes according to the nature of 
the test, and different processes for different children according 
to their mental levels. When these same tests have become 
relatively easy, because they are applied at higher mental 
levels the mental processes they call into play are more uni- 
form from test to test, as well as more uniform from child to 
child. They have become speed tests. Consequently there 
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is closer agreement between the results from the two halves 
of a test battery. 

Again a superficial analysis might lead one to infer from this 
that the mental ages the test results give at the higher levels 
would be more uniform, show less variability than at the lower 
levels. They do the opposite, and for two reasons. First, 
when tests call out much less than maximum effort on the 
whole, that is for most children, there is given the possibility 
of wide variations in effort with which different children will 
work. A few will work at top speed, while a few others will 
work with minimum effort. The approximately maximum 
effort that relatively difficult tests call out means approxi- 
mately uniform effort from child to child. Anything less than 
this means variability in effort. Secondly, the variability in 
raw scores for the relatively few cases resulting from this 
variability in effort is magnified several times when the raw 
scores are transmuted into mental ages. The raw score age 
norms for the tests have been lowered at the higher levels 
because of this lower effort on the part of most of the children 
that figured in the standardization of the tests. Consequently 
from this alone any particular child that works at his best 
will earn a mental age on the tests that is much above his true 
mental level. But, in addition to this, the increments in raw 
score age norms from year to year have become much smaller 
for the higher levels, so that now a relatively small increase in 
raw score any particular child gets will mean a relatively much 
larger increase in mental age. 

The foregoing results and discussion lead to two conclusions 
of fundamental and far-reaching importance in the construction 
and evaluation of intelligence test scales. The first is that no 
test can give uniformly good results at several different age 
levels. When a battery of tests is applied to four or five 
different school grades it is likely to function very poorly at 
one point or another, compared with its performance at the 
mental level for which it fits best. The occurrence of zero and 
maximum scores on the separate tests in such a battery can 
easily be avoided by having the tests consist of many relatively 
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easy trials, and by allowing a short time for their performance. 
Continued increase in total raw scores on the battery from one 
age level to the next is also easy to secure. This gives the 
battery the appearance of a wide range of applicability and 
satisfactory operation. But these are the very traits of a 
mental test that make it a poor test. 

Second, such a battery of tests will show a high degree of 
validity and reliability, as determined by the usual Pearson 
Correlation formula. The poorer the tests in the traits here 
under consideration the higher will be the Pearson r between 
the test’s results and another criterion, and between the two 
halves of a battery of such tests. 





A MENTAL ALERTNESS EXAMINATION FOR THE 
WORKING AGE LEVEL 


RICHARD 8. SCHULTZ 


Columbia University 


This paper is a study of the Viteles Mental Alertness Exam- 
ination for the Working Age Level or T-100. The test was 
devised by Dr. Morris 8. Viteles, University of Pennsylvania, 
primarily for the selection of junior employees and for guidance 
at the working age level. 

The contents and nature of the various tasks are adapted for 
measuring intelligence at the working age level. The tasks are 
comparable to the practical situations that a boy or girl would 
be likely to meet in circumstances of every day life, at work orin 
applying fora job. The material is possibly of greater interest 


to the boy or girl for this reason. The contents of the examina- 
tion, the self-administrative feature and the twenty-minute time 
limit make T-100 a practical instrument for the measurement 
of intelligence in the vocational guidance bureau and in the 
employment office. 


DESCRIPTION OF THE VITELES MENTAL ALERTNESS TEST AND 
METHOD OF SCORING 


T-100 includes eight types of mental tests, viz.; simple 
arithmetic problems, language completion, multiple choice, 
directions, information, same-difference, judgment and anal- 
ogies. The form consists of eight pages about the size of an 
ordinary bookleaf. There are 58 items on the blank. Of these, 
the first eight items are samples and typical of the tasks through- 
out the examination. 

T-100 is essentially a self-administrative group test. It may 
also be reliably adapted as an individual test to meet practical 

46 


t RENTS AM FRED AO TIE NR IS a Naa pata Ie - ‘, at eX en aha 

































eas aie gage patents GFE 


J NRE LPF dane PERS 


(onan 


A MENTAL ALERTNESS EXAMINATION 47 


demands. Directions are clearly stated in the forepart of the 
first page. The examiner need only distribute the blanks and 
give the signal to begin. A time limit of twenty minutes used 
in this study seems to be optimum. 

A maximum score on this test is 100. Each item has a value 
of two (2) if correct and zero (0) if wrong or omitted (within the 
range of attempts). There are two exceptions to this scoring 
method: (a) ‘Same and Difference” tasks are scored R minus 
W, with no score less than zero or greater than two; (b) “In- 
correctly spelled” tasks are credited +1 for each word cor- 
rected and —1 for correct word changed, with no score less than 
zero or greater than two. With this simple method of credit 
it is possible to score the test in 1.5 minutes. 


SUBJECTS 


The data to be considered is based on two distinct groups, 
392 cases. The major group! of this study consists of 293 boys 
and girls from two public schools in the same section of the city. 
They range in age from 10 to 16 years. Four classes were 
tested in each school—a 6B, 7A, 8A, and 8B grade. The two 
schools will be referred to as “X’’ and “Y.”’ School “X”’ in- 
cludes mostly Italian children, with a small percentage of 
Jewish children and a few Negroes. In school ‘“‘Y”’’ there is a 
large percentage of Jewish children and a small number of 
Polish, Russian, Spanish, Finnish and Negro origin. Most 
of the children in these groups were born in the United States. 
The parents represent a first generation of immigrants. They 
are engaged in occupations ranging from common to skilled 
labor, with a goodly number in “‘small trade.” 


1 A more detailed study was made with this group, by the same writer, 
in a paper—A Test for Motor Capacity in the Industries and in the 
School, Jour. Applied Psychology, Vol. XII, 2, 1928, p. 169. This 
article also includes a sample of the Viteles Mental Alertness Exam- 
ination for the Working‘Age level. The data represent part of a study 
begun at the University of Pennsylvania. It was later completed 
and submitted as a partial fulfillment for a Master’s degree at 
Oolumbia University. 
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The other group consists of 99 pupils in a trade school for 
girls. The course in this school includes besides the general 
academic studies (arithmetic, spelling, grammar, etc.), special 
training in dressmaking, designing and fitting, millinery and 
machine operating. The median age is 16 years 7 months, 
with only one case at each extreme of the range between 13 
years 6 months and twenty years. Most of them are girls who 
“could not get along so well in regular schools’ or who were 
“not interested in regular school.’ 


TREATMENT OF DATA 


The following data represents a preliminary evaluation of the 
Viteles Mental Alertness Examination for the Working age level. 
In addition to figures on the validity and reliability of this test 
is presented data on the school, race, age, grade, occupation, 
and sex differences appearing from the treatment of the results. 


VALIDITY 


Three intelligence tests were used in this study. The Otis 
Intermediate Examination (grade 5-9) was the first test given 
to the regular public school group. About one month later the 
Viteles Mental Alertness Test for the Working age level (T-100) 
was administered to the same group.2, Two months afterward 
an examination*® devised by the school authorities was given to 
8B grade pupils in each school. 

With the results available from these examination it was 
possible to obtain a measure of validity of T-100. Scores on 
this test were correlated respectively with scores on the other 
two tests. This method assumes that the Otis and City Ex- 
aminations are good measures of intelligence. 


? The Trade school girls had been given only T-100 at a previous time 
and will not be considered in this part of the paper. 

* The City Examination (Group Test of Mental Ability) is a combina- 
tion of tests modelled very closely on the Army Alpha and Beta tests. 
It consists of direction items, multiple choice items, number completion 
series, picture completion, information items, and same-opposite items. 
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On the basis of the Otis Test as a criterion (table 1), T-100 
has a significantly high index of validity with a small probable 
error. The validity is slightly in favor of boys, but without 
reliable significance. Intelligence as defined in the nature of 
the City Examination is less adequately measured by the Otis 








TABLE 1 
Correlation of T-100 with Otis examination 
GROUP r PE, 4x PE, 
PN. oo ccavecshasdaecenasd 0.796 | 0.0153 | 0.0612 
el RP Re errr 0.819 | 0.0191 | 0.0764 
Ry Se The eee Pe 0.762 | 0.0240 | 0.0960 











* Twenty-six examination blanks were not properly identified and 
will be disregarded in parts of the study. 





TABLE 2 
Comparison of T-100 and Otis Test as correlated with City Examination— 
8B grade 
GROUP vests conretatep | ¢ | PE, | pp | D/PExraig 


T 





72—boys and girls*. ..| City Examina- |0.907/0.0140)0.0560) +4.71 
tion and 
T-100 

82—boys and girls. ...| City Examina- |0.715|0.0383/0. 1532 
tion and Otis 




















* Ten subjects did not take T-100. 


Test than by the Viteles Mental Alertness Examination for the 
working age level. The index 4.71 (table 2) makes this differ- 
ence in favor of T-100 real and certain (D/P.E.aig, = 4.00 is 
usually considered reliable).‘ 


‘ By squaring the coefficients of correlation an index of overlapping 
of test elements is obtained. The square of these coefficients are sub- 
tracted (0.907? = 0.8226, 0.715" = 0.5112), which gives 0.3114. This in- 
dicates that T-100 measures 31.14 per cent more of the elements in the 
City Examination than the Otis Test. 
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RELIABILITY 





The results so far have shown that T-100 correlates very 
highly with the criteria and is a valid instrument for measuring 
intelligence. In order to determine how consistently this test 
measures intelligence a coefficient of reliability was computed 
(table 3), by correlating the scores on the odd items with the 




















TABLE 3 
Coefficients of reliability for T-100 
onaaaam GROUP r PE, |4XxPE, 
293 Public School (boys and girls). .| 0.907 0.0069 | 0.0276 
99 Trade school (girls)............. 0.889 0.0144 | 0.0576 
115 Italian (boys and girls)........ 0.871 0.0155 | 0.0620 
118 Jewish (boys and girls)......... 0.906 0.0113 | 0.0452 
131 TE ncn een nash bute ds henbiied 0.917 0.0091 | 0.0364 
136 TS EE PS AAA 0.884 0.0133 | 0.0532 
63 6B Grade (boys and girls)..... 0.806 0.0295 | 0.1180 
69 7A Grade (boys and girls)...... 0.848 0.0224 | 0.0896 
64 8A Grade (boys and girls)...... 0.879 0.0190 | 0.0760 
71 8B Grade (boys and girls)...... 0.882 0.0175 | 0.0700 





scores on the even items. In this way constant errors were 
avoided and every item attempted was included. The coeffi- 
cient of reliability is sufficiently high so that we may be certain 
under similar conditions with the same group results obtained 
would be consistent. Each individual would approximate his 
original position very closely on another trial of this test. 

The coefficient of reliability, r = 0.907 +0.0069 is obtained for 
the entire heterogeneous group of 293 cases.° Upon investigat- 


5’ Spearman’s formula for finding the reliability of the whole test, 
from one application of the test: 
2 rh 


w 1+r, 





Tz 
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ing in the smaller® homogeneous groups we find the coefficient is 
relatively high. Reliability tends to increase with higher grade. 
The consistency of measurement slightly favors boys as against 
girls, and Jewish children as against Italian children. 








TABLE 4 
Groups distinguished on basis of attempts, error and score 
PUBLIC 
GROUP Ry ecmnet mete ab (oowe ane 
ems) (GIRLS) GIRLS) GIRLS) 

Attempts* 

Foi cd nicashnsiscinsins 30.96 35.85 28.91 32.12 

Gils cancockunscistkusacucenun 8.62 7.83 8.33 8.24 

Weck ccahacetevenhan 27.84 21.84 28.83 25. 64 

EP Ey errr 5-50 10-50 5-50 10-50 
Errort 

pT eT Pe EYL EE 20.74 24.14 23.90 18.50 

© one clinhils sitly tps deine detindie oh 10.27 11.90 11.31 8.50 

AR 49.54 45.11 47.34 45.92 

Be cok phdeds «dn den tenndy od 0-65 5-75 0-60 0-50 
Score 

PA eee seer rie rere 39.67 46.87 34.11 45.97 

Fi cacchecuabeandesanesaueasret 16.17 14.74 13.70 15.87 

PE cincdiesccnatsescn 40.77 31.44 40,18 34.13 

MM stahestattictieatevena 0-90 5-80 0-70 10-90 

















* Number of attempts—determined by the last item tried. 
+t Amount of error—number of attempts X 2 minus actual score. 





00 
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GROUP DIFFERENCES 


As a means of determining the factors which differentiate the 
groups in performance on the Viteles Mental Alertness Test 


* Kelly’s formula for comparing reliability and its effectiveness in 


parts of the range: 
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for the Working Age Level—(1) average number of attempts, 
(2) average amount of error, (3) and average score were com- 
pared (table 4). The coefficient of variability differentiates 
the groups uniformly in the three mentioned factors. The 
trade school girls are superior to the public school group in the 
number of attempts and score; but they make more errors over 
a greater range. Italian children are inferior in every respect 
of the performance to Jewish children, with a most consistent 
tendency to more variability or irregularity. 

Italian children make fewer attempts, more error and lower 
score than Jewish children (table 5). The difference is most 


TABLE 5 





Relation of grade and race to average attempts, average amount of error 
and average score 












































6B GRADE 7A GRADE 8A GRADE 8B GRADE 

GROUP S a r= P| 5 a q 

Plt Ea El al EF] 3 

- _ La) — ar) = Lo) — 
pS re 24.44/24. 01/30.58/27.70)33.21/32.34/36.92/32.50 
RS, Cee eee 15.56/30 .25)15.32/27 .42/18.04/21 .21/17.80|22.69 
a eae 28.88/23 .71/40.58/25.30 49.46/43. 31 52.50)47. 40 





pronounced for average amount of error in 6B and 7A grade, 
and for average score in 7A grade. The average number of 
attempts is almost the same in the 6B grade with an irregular 
difference persisting in the other grades in favor of the Jewish 
group. To explain these differences the effect of schooling 
seems most pertinent. 

The average number of attempts increase with grade; the 
average amount of error decreases for Italian children, and in- 
creases slightly for Jewish children only when 6B and 7A grades 
are considered against 8A and 8B grades (fig. 1). The average 
score increases rapidly for Jewish children and more slowly for 
Italian children with grade. 
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RELIABILITY OF GROUP DIFFERENCES 


Significant group differences have been demonstrated. A 
study of the reliability of the differences between average score 
(table 6) shows the same persisting tendencies. School and 
racial differences are decidedly in favor of the Jewish group. 
The reliability of difference decreases with higher grade. When 
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Fig. 1. Grapuic REPRESENTATION OF TABLE 4—SHOWING MARKED Re- 
LATION OF GRADE AND Race TO AVERAGE ATTEMPTS, AVERAGE 
Amount oF ERROR AND AVERAGE SCORE 


boys and girls are considered separately grade differences are 
more reliable. A sex difference in the Jewish group persists in 
favor of boys. Between age groups the differences are not 
significant except age 11 and 12. 


GRAPHIC DEMONSTRATION OF GROUP DIFFERENCES 


Throughout this paper an attempt has been made to demon- 
strate the manner in which performance on T-100 distinguishes 
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TABLE 6 
Reliability of differences between groups 
CHANCES IN 
GRovUPS* D/o aie. es 
FERENCE 

School: 

en. “ae NED OR nc cdonnnd coeds ceceses —1.95+ | 97.44 

Trade school girls vs. public school girls......... +4.31 | 100.00 
Race: 

Re dek dak cdeubscoccéepes teres +5.98 | 100.00 
Occupation :f 

Business vs. non-business....................... +0.83 79.67 
Age: 

CU ER ee) ee ee ee —1.88 96.99 

I ee ee ee Eee ee +0.22 58.71 

EE ER ccolatwstcctasbhe rascveshuscessbaent +0.79 78.52 

EE « cons six nda tebecha ashes cbans teed —0.10 53.98 

IE .. 5. |. cdc vauussubae cobaeaboub anal —1.16 | 87.70 

I Se Sr ere ee eee eee +0.55 70.88 

ES th shades cc ibatinddbadehun cd scan «i —0.05 51.99 

» | S Bae Sh eee ee se —0.03 51.20 
Grade: 

OU. ds wei babsncbcopebesdod ves anes —2.95 99.84 

ye oe ee eee oe —1.51 93.45 

rt ree ee eres se ee —0.86 80.51 

i Gm FAs ei as Sarthe certs seveviw eves sires —4.13 | 100.00 

sch cnt buch ciel eeneavenasewhg —2.94 99.84 

Boys GA we. GB.........cc0see0 cabebtdeddseweswod —2.16 98.46 

NE ES ochinccics teametata tects sovekiens —4.72 | 100.00 

EE I ii a ciieliin sss abeneinweseunee —2.13 98.34 

ON IE. Gal cai cub deeded veuw sb na den es —1.33 90.82 
Sex: 

Public school (girls vs. boys)...................: —1.54 93.83 

ee Ds Gon caccencccdssoves —2.05 97.98 

es Se er errr —0.43 66.64 

SU EA UR na. 6 var scvcdcnecstibonass —1.63 94.84 

EE os cn cand a cenkasawnees —1.33 90.82 



















* A group includes boys and girls unless otherwise stated. 
+ Sign is to be regarded in relation to the first member of the 
comparison. 
t This classification is recognized as arbitrary. It is based on occu- 
pations of parents and information given by the subjects. 
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the various groups. The differences discussed above are pre- 
sented in graphic form in the succeeding sections. 


MAR eran 78 Brat 


a. Entire distribution 


The distribution for public school children (fig. 2) approxi- 
mates a normal probability curve. There is a sudden drop 
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Fic. 2. Entire Distrisution or Scores on T-100, Pusiic ScHoor 
CHILDREN—293 Cases (Boys AND GIRLs) 


Average Score = 39.67. o = 16.17 


between 45-50 that may be explained. At this interval the 
upper end of the distribution for Italian children has been 
reached and the cases become few. For Jewish children this is 
about the midpoint of the distribution, but with the sudden 
tapering off in the distribution of Italian children, with which 
it is combined in figure 2, the drop is exaggerated. 
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b. Race differences 


For Italian children (fig. 3) scores tend to distribute them- 
selves over the lower portion of the scale, while for Jewish chil- 
dren the reverse is true. The Jewish group is only 86 per cent 
as variable as the Italian group; 74.9 per cent of the Jewish 
children reach or exceed the average score of Italian children. 
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Fig. 3. DistrisuTion or Scores on T-100—ITALIAN AND JEWISH 
Groups (Boys anp GiIRLs) 


N AV. SCORE ¢ 
Italian 115 34.11 13.70 
Jewish 118 45.97 15.87 


c. Age differences 


There is pronounced overlapping in the various age groups 
(fig. 4). The scoring is approximately within the same range. 
The distributions tend to be bimodal and show the effect of too 
few cases. Differences are not significant except at ages 11-12. 
Variability increases with age. 
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d. Grade differences 


School grade is a significant factor in differentiating per- 
formance on this test. The grade distributions (fig. 5), although 
covering a continuous spread, are discrete. Differences are 
pertinent between all grades but most reliable between 6B and 
7A grade. Variability decreases with higher grade. 
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Fria. 4. Ace DistriBuTIon or Scores on T-100—Boys anp GIRLS 
(Pustic ScHoot) 


AGE N AV. SCORE o 
11 32 38.75 14.63 
12 68 44.78 15.89 
13 82 43.30 15.34 
14 58 37.33 16.43 
15 20 39.25 17.22 





The coefficient of correlation (table 7) also shows the inter- 
dependence of performance on T-100 and school grade. The 
probable error for the correlation of test score with age is so 
large as to make the coefficient negligible. Inspection of the 
correlation scatter diagram, separately, for Jewish and Italian 
indicates the same low relation of score to age. These scatter 
diagrams appear almost rectangular—in the form of a very low 
or zero correlation. 
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When a partial coefficient of correlation is obtained with age 
being kept constant, the correlation between school grade and 
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Fia. 5. Grape DistrisuTion or Scores on T-100—Boys anp GIRis 
(Pusiic §cHOOL) 


GRADE N AV. SCORE a 
6B 63 24.85 11.15 
7A 69 37.43 12.61 
8A 64 45.63 14.13 
8B 71 50.74 14.34 

TABLE 7 


Correlation of T-100 with age and grade 














GROUP RELATION r PE.» Pr, 
293 (boys and girls)........ Age and T-100 0.144 | 0.0385) 0.1440 
293 (boys and girls)........ Grade and T-100 | 0.575 | 0.0273| 0.1092 











T-100 is not changed. This would imply that the factor of age 
can be disregarded as affecting the correlation between school 
grade and test score on T-100. 
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e. Sex differences 


Boys and girls score over the same range. The higher aver- 
age score for boys is accompanied by greater variability. Girls 
are only 90 per cent as variable as boys (fig. 6). 

f. Trade school distribution 


Excluding the gap 15-20, the distribution for trade school 
girls (fig. 7) follows a normal chance curve. The average score 
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Fia. 6. Distrrsution OF Scores on T-100—Boys anp GIRLS 
N AV. SCORE o 
Boys 131 41.47 17.31 
Girls 136 38.46 14.46 


is reliably higher than the average for public school girls as well 
as for the entire public school group. The trade school girls are 
older and many have been in school for a longer period. These 
facts may account for the higher average score. 

Trade school girls are only 83.9 per cent as variable in per- 
formance on this test as public school girls. This may be ac- 
counted for in the less random selection of the group. 
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SUMMARY OF CONCLUSIONS 


1. The Viteles Mental Alertness Test for the Working Age 
Level or T-100 has a reliability coefficient of 0.907 +0.0069 P.E. 

2a. The validity of this test is 0.796 +0.0153 P.E. with the 
Otis test as a criterion; on the basis of the City Examination, 
the validity for T-100 is 0.907 +0.0140 P.E. 
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Fig. 7. ENtrrE DistrIsBvuTION oF Scores on T-100 


Trade School Girls—99 Cases 
Av. Score = 46.87. o = 14.74 


b. For the 8B grade in this study, T-100 is a reliably better 
measure of intelligence (as defined by the City Examination) 
than the Otis test. 

3. This test correlates 0.575 +0.0273 P.E. with grade. 

4. The correlation 0.144 +0.0385 P.E. with age is negligible. 

5. Race, school and grade groups are significantly differ- 
entiated by this test. 

a. Italian children are inferior to Jewish children. They 
attempt fewer items, make a greater amount of errors, obtain 
lower scores and are more variable in performance. 
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b. School ‘‘Y’’ (mostly Jewish children) surpasses school ‘‘X”’ 
(mostly Italian children). 

c. Trade school girls, older and with more schooling, are 
reliably superior to public school girls. 

d. Scores on this test increase directly with grade, showing 
more attempts and less amount of error. 

6. Age and sex differences on this test are not very significant. 


GENERAL SUMMARY 


The self-administrative feature of the Viteles Alertness Test 
for the Working Age Level, the nature of the contents, the 20 
minute time limit and the simple scoring method would make 
this test a desirable instrument for measuring intelligence in the 
school, in the vocational guidance bureau and in the employ- 
ment office. 

The significant interdependence of performance on T-100 and 
school grade suggests a possible preference of this test for 
classification in the school. The administrative feature of this 
test coupled with other factors indicates its use with larger 
school groups. 

A careful examination of the nature of the tasks in T-100 
shows its possibilities as a device which is perhaps more inter- 
esting to the boy or girl at the working age level. The items in 
T-100 have the element of practicality that challenge and inter- 
est the boy or girl at the working age level. Problems in mak- 
ing change or answers to practical situations, etc., are cer- 
tainly of immediate interest to the boy or girl at this level. 
These factors recommend this test as a desirable instrument 
for the vocational guidance bureau as well as the employment 
office. 














MEMORY VALUE OF ABSOLUTE SIZE IN MAGAZINE 
ADVERTISING 


S.M. NEWHALL anp M. H. HEIM 
Yale University 


When one looks at a magazine does one remember the larger 
advertisements better because they are absolutely larger? 
For the same reason, does better retention accompany the 
reading of large sized magazines than of small? This problem 
of the influence of absolute size upon memory for advertising 
is one which does not appear to have been investigated before." 
Consequently it seems well to differentiate here from the 
existing studies of relative size.” 


ABSOLUTE VS. RELATIVE MAGNITUDE 


To the advertising man, absolute size means simply the area 
of the advertisement as measured in standard units such as 
square inches. It is distinguished from relative size which 
refers ordinarily to that fraction of the given page occupied 
by the advertisement. So, for instance, while quarter page 
advertisements in different sized media are equal relative to 
space available they differ in actual area. Size relative to size 
of page is no more to be considered, however than size relative 
to size of other advertisments on the page. Perhaps relative 
magnitude should even include the size of pages or advertise- 
ments previously seen. In any event the advertisements 
themselves are less significant visually than the corresponding 


1 The writers are indebted to Professor A. T. Poffenberger for sug- 
gesting this stimulating problem. 

? For references to studies of the memory and attention value of 
relative size see: Starch, D., Principles of Advertising, 1923, Ch. 23; 
Poffenberger, A. T., Psychology in Advertising, 1925, Ch. 8. 
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retinal images, for the latter are the more dependable optic 
stimuli. 

In terms of retinal imagery both size factors may be regarded 
relative. Absolute extent would be directly relative to the 
total retinal or visual field if the image were directly propor- 
tional to the advertisement. The latter is true if the individual 
regularly views advertisements from the same absolute distance 
away. And this condition is approximated in much focussed 
observation of periodicals. Of course there are individual dif- 
ferences and the individual varies, especially with extremes of 
absolute size. Because of the variables upon which relative 
size depends the magnitude of the image is partially independ- 
ent of relative size. This second size factor can be regarded 
as relative to the image of the periodical page and of the other 
advertisements on the page. Briefly, mere extent of stimula- 
tion in the total field varies with absolute size while amount of 
difference or size-contrast in stimulation is peculiar to relative 
size. 


TREATMENT OF VARIABLES 


There are numerous factors present along with size and many 
of them conceivably can contribute to the specific effect with 
which the investigator happens to be concerned. Those 
bearing a chance relation to size can ordinarily be made innocuous 
when attempting to discover the effect of size upon the particu- 
lar response being measured. But there are other factors 
comparatively co-variant with size. Some of these appear to 
be such regular and altogether proper or unavoidable co-variants 
in actual advertising practice that it would seem that more 
directly applicable results could be secured by not attempting to 
eliminate them experimentally but rather by counting them in 
with size and by taking a measure of the combined effect. The 
advertiser is not ordinarily interested or helped by knowing 
singly the contributions of factors which are bound anyway to 
occur together in practice. In regard to relative size, for in- 
stance, it is thought that the larger the advertisement the less the 


* Poffenberger, op. cit., 175-6; Starch, op. cit., 578. 
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competition for the reader’s attention from other advertisements 
on the page simply because there must be fewer or smaller 
advertisements remaining. Studies of relative size in which all 
other advertisements or reading materials have been eliminated 
are usually not so directly applicable because the usual adver- 
tising situation has been violated by eliminating a usual co- 
variant of magnitude. Naturally this type of statement as- 
sumes that the co-variant has some potency to influence the 
response in question. Similarly, in absolute size, the area of 
visual field surrounding the advertisement attended to is regu- 
larly heterogeneous and inversely related in size to the visual 
angle subtended by the advertisement. This being the case in 
the practical situation it would seem well to let it remain so in 
the test situation. Of course, in strictness, any combined effect 
measured cannot be ascribed exclusively to size properly so- 
called, nor is it even clear that, ultimately, nothing practical 
would be gained by separate measurements. 


METHOD 


The measurements of the memory value of absolute size were 
made with the special case of the method of paired associates 
known as aided reca‘l.‘ By this method the subject is shown a 
series of advertisments of different commodities. Thereafter 
he is shown simply the names of the commodities with the 
instruction to try to reproduce the corresponding trade-names. 
His score depends upon his demonstrated ability to do that. 
This method was chosen because it seems to approximate one 
of the most important advertising situations. If when the need 
or commodity is in mind the brand advertised will come upthen 
obviously a desirable association has been formed or at least has 
functioned. 

The same advertisements in three absolute sizes were em- 
ployed respectively in testing three comparable groups of sub- 
jects and the results were subsequently compared to see whether 
or not a differential was present. The testing and scoring were 
performed entirely by one of the authors. 


* Poffenberger, op. cit., 512-514. 
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With our procedure all the advertisements seen by any partic- 
ular subject are full-page and of uniform absolute size. The 
purpose of the single size for the given subject is to avoid the 
possibility of relative or contrast effect from successive exposure 
of different sizes. The elimination of this possible successive 
contrast was a departure from the usual magazine situation. 
With the exclusive use of full-page, relative size as a simultaneous 
factor also might be considered ruled out. Relative size, as 
defined, would then seem to be completely eliminated except in 
so far as full-page itself embodies relative size in the extreme. 

The procedure might seem criticizable on the ground that both 
relative and absolute factors are present in typical advertising 
situations but are not so here. There would be nodifficulty ifthe 
two size factors were regular co-variants since then, for reasons 
given above, the elimination of relative size would not be con- 
sidered justifiable practically. Indeed, absolute size as a sepa- 
rate practical problem would disappear and the question of 
eliminating relative size along with it. It has been shown, 
though, that the variables are far from regular co-variants. 
Could we assume that they bear chance relationship to each 
other isolation should, again, be unnecessary for then, on general 
principles, the one factor should not obscure the effect of the 
other. Since we could not assume that relative size is a chance 
variable there seemed nothing to do but rule it out as far as 
possible. Any remaining influence due to the extreme relative 
size of our full-page materials might be assumed to operate 
equally and so undisturbingly upon the results for each absolute 
size. 


SUBJECTS AND MATERIALS 


One hundred magazines coming to hand at random were 
measured, their areas computed, and the range of size found to 
be approximately 50 to 150 square inches. The experimental 
areas were then set at 50, 100, and 150 square inches, partly to 
cover the indicated range for full-page advertising and partly 
to make feasible photostatic production of all three sizes from a 
single master set of advertisements. The master set was chosen 
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from a large number of full-page advertisements in the Saturday 
Evening Post. Considerable effort was expended to select 
material adapted to the experiment but at the same time as 
random as possible in respect to accidental variables. Since 
the two smaller sizes were to be obtained by reduction it was 
necessary to avoid illegibility by selecting against small type. 
This adjustment and the comparatively short range of sizes 
used helped to eliminate gross variation in reading distance. 
Since the aided recall method had been adopted all copy failing 
to show clearly both commodity and trade-name was thrown 
out. Since the photostatic process reduces ail colors to chiaros- 
curo it was important to also reject advertisements where color 
contrast without sufficient intensity contrast would cause con- 
fusion in the positive photostat. Of the advertisements finally 
reproduced, 27 in each size were found to conform to our require- 
ments. Those of the smallest size, called Series 1, were enclosed 
between two plain cardboard covers by means of metal rings 
through reinforced holes in the sheets. Proportional covers 
were similarly used in constructing dummies out of the other 
two sizes, Series 2 and Series 3. 

Astop-watch anda pendulum were employed for controlling the 
exposure times in the aided recall test. The scoring of the test 
wes facilitated by using form sheets giving the names of the 
comme dities and space for the subjects’ responses. 

The name of each commodity with the corresponding brand 
was typed on a separate blank card. These cards, as explained 
in the next section, were used for estimating the subjects’ 
familiarity with the trade-names. The point here was to see 
whether the effect of size would differ according to familiarity. 
Large size might prove better for introduction but small size 
equally good for reminder purposes. 

The 135 subjects of this experiment were taken, with a few 
exceptions, from the undergraduate classes of Yale College in 
the academic year 1927-1928. Each class was about equally 
represented. A random sampling of potential market seemed 
quite unnecessary for this type of problem. In general, when 
requested to act as subjects the students consented with alac- 
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rity, and after participation an interest gratifying to the experi- 
menter was usually evidenced. 


DETAILS OF PROCEDURE 


Three rates of presentation were used for each size of advertise- 
ment. Any individual subject was employed at but one rate 
and but one size. The durations of exposure were two, five, 
and fifteen seconds. The writers felt that relatively brief 
exposures would better represent the usual practical situations 
and also that they would be more likely to disclose any differ- 
entiation that might exist. Furthermore, leafing through at 
the two second rate might require more of perception or atten- 
tion while exposure at fifteen seconds might be more of a mem- 
ory measure, and, conceivably, yield a different sort of numerical 
result. It seemed interesting to try the several rather short 
durations. 

After having gained the consent of an individual to act as 
subject in a psychological experiment he was seated at a desk. 
Some one of the three dummy magazines was placed before him 
unopened to remain thus until the instructions had been read to 
him and it had been ascertained that they were understood. 
The remaining details of procedure are evident from the 
instructions as quoted below. 


I nstructions 


‘This is a psychological experiment to test the effectiveness of certain 
types of advertising. It consists of three parts. 


Part 1 


“‘T have here a folder representing a magazine and its contents. The 
folder contains a number of full-page advertisements. You are to leave 
it on the desk before you, and look over the advertisements as you 
might those which would interest you in a magazine. The speed at 
which you turn the pages, however, will be regulated. At the proper 
time interval, I shall tap with my pencil and you are to turn the pede 
promptly on that signal. Do not immediately turn the first page, but 
wait until you can familiarize yourself with the time interval, one of 
which I shall now give you. Please turn the pages carefully to avoid 
tearing. Are there any questions? 
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Pari 2 


**That completes the first part of the experiment. I have here a list 
of the commodities which have been represented in the advertisements. 
You are to write in the spaces provided for them the trade-names of the 
commodities and a brief description of any picture which might have 
accompanied them. The trade-names must be the ones in these adver- 
tisements. Do not guess! It is desired that you first go down the 
entire list of commodities in order—then likewise describe the pictures. 
If you cannot recall the trade-names but do remember the pictures 
please describe—and vice-versa. You will have 10 minutes in which to 
recall these data. If you complete the two lists before that period is 
spent, you may then work at random during the remainder of your time, 
but guard against needlessly embellishing your descriptions of the 
pictures. Are there any questions? Proceed when I say ‘Go.’ ”’ 


Part 3 


After the aided recall test the subject was given the deck of cards 
bearing the trade-names paired with the commodities and requested to 
arrange the cards in order of merit with respect to degree of familiarity 
with the brand names. 


The manner in which the subject observes the advertisements 
has been frequently referred to as a source of error in experi- 
ments of this character. Sometimes it is possible to defer the 
drafting of subjects until after they have examined the material 
in the natural course of events, thus assuring spontaneous atten- 
tion. Other times the subject is deliberately told or directed to 
what he is going to be tested for, with the result that his atten- 
tion is artificially focussed and restricted from the start. 
Though perhaps this kind of artificiality would not mask a 
possible kind of effect in our experiment, it would certainly tend 
to alter the degree thereof. It seemed best, here as elsewhere, 
to try to avoid departure from the advertising situation. 
Therefore, assurance was always required that the experiment 
was entirely unfamiliar to the subject and the instructions to 
Part 2 were never given until after Part 1 had been completed. 


RESULTS 


The results consist of two scores obtained from each individ- 
ual paper, for it will be remembered that the subject was 
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asked not only to recall the trade-names but also to give brief 
descriptions of the accompanying pictures. The trade-names 
were scored as follows: Perfect—3, misspelled—2, recogniz- 
able—1, substitute—0, no—0. The pictures were naturally 
more difficult to score. Some individuals described them 
tersely but accurately. Others were more elaborate and equally 
accurate. Again, there were some who not only described 
the picture exactly but also added fanciful details. In a few 
cases the description was so meagre as to be but just recogniz- 
able. The following schedule was utilized: Perfect—3, perfect 
plus fanciful details—2, recognizable—1, substitute—0, no—0. 
Each answer was scrupulously considered, and the marks were | 
submitted to a careful revision but this second process revealed 
no differences which might affect the significance of the aggre- 
gate results. 

Table 1 gives the scores for each paper, the sums of these 
scores for each of the three series at each time of exposure, and 
the corresponding averages. The averages, as might be ex- 
pected, increase regularly with the duration of exposure. 
Regardless of exposure, however, there is no consistent tendency 
for the average scores to either increase or decrease with size of 
series. While the picture scores at the two second rate do 
increase with size, the trade-name scores at the fifteen second 
rate decrease. The one set of figures appears in the lower left 
corner of the table and the other set at the right middle. The 
results as a whole seem to be random. When distribution 
curves were constructed, with scores along the baselines and 
frequencies as ordinates, they were found to be irregular; but 
any three curves based on the same duration of exposure and 
scoring item exhibited extensive overlapping. 

To see whether or not more statistical refinement would 
bring out some dependency between absolute size and aided 
recall, the calculations summarized in table 2 were performed. 
The “significance” of the difference between each pair of 
averages at each exposure time is shown. As table 1 clearly 
indicates each average is of but 15cases. Obviously, more cases 
would be appropriate for this kind of treatment. It is evident 





TABLE 1 
Scores in aided recall of three absolute sizes each at three durations 



























































of exposure 
DDR is 0dciiss inks 1 2 | 3 1 2 3 1 2 3 
EXPOSURE ........ 2 2 2 5 5 5 15 15 15 
Trade names 
SCORES 
1 5 3 6 15 3 |12 | 12 | 21 | 12 
2 6 9 ll 17 18 | 21 | 23 | 23 | 21 
3 9 18 15 18 23 | 21 | 33 | 31 | 25 
4 14 18 15 18 30 | 23 | 33 | 33 | 26 
5 16 18 17 18 31 | 27 | 39 | 34 | 32 
6 18 21 20 18 32 | 28 | 39 | 34 | 45 
7 21 27 21 27 35 | 30 | 40 | 39 | 47 
8 21 27 21 29 39 | 34 | 43 | 48 | 48 
9 27 32 22 30 42 | 37 | 46 | 48 | 49 
10 28 33 24 30 46 | 39 | 53 | 48 | 51 
11 29 33 24 45 47 | 39 | 64 | 59 | 52 
12 29 36 24 51 47 | 39 | 65 | 60 | 54 
13 33 45 26 57 47 | 40 | 65 | 62 | 62 
14 33 48 30 57 48 | 49 | 68 | 66 | 65 
15 34 57 55 57 59 | 54 | 69 | 72 | 66 
Aggregates....| 323 | 425 | 331 | 487 | 547 |493 (692 (678 (655 
Averages......| 21.5) 28.3) 22.1) 32.5] 36.5) 32.8) 46.1) 45.2) 43.6 
Pictures 
SCORES 
1 9 9 8 12 9 |10 | 12 | 29 | 19 
2 14 12 9 19 13 | 18 | 35 | 30 | 23 
3 14 12 15 20 15 | 20 | 38 | 3 | 26 
4 15 15 17 27 18 | 23 | 47 | 41 | 36 
5 16 18 21 29 21 | 25 | 47 | 45 | 42 
6 17 20 22 30 28 | 26 | 48 | 51 | 43 
7 17 21 24 36 30 | 31 | 50 | 56 | 44 
8 18 22 32 38 30 | 40 | 50 | 59 | 45 
y 20 22 32 39 30 | 42 | 50 | 65 | 48 
10 21 24 34 39 32 | 46 | 55 | 66 | 48 
ll 23 30 36 41 37 | 47 | 57 | 68 | 49 
12 24 32 37 42 39 | 49 | 60 | 68 | 50 
13 27 39 38 44 42 | 51 | 67 | 75 | 50 
14 31 40 41 46 55 | 51 | 67 | 75 | 38 
15 43 49 42 46 60 | 53 | 69 | 79 | 66 
Aggregates....| 309 | 365 | 408 | 508 | 459 (532 ('752 (842 (647 
Averages...... 20.6) 24.3) 27.2) 33.9) 30.6) 35.5) 50.1) 56.2) 43.2 
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from the last column in table 2 that only one (the last) difference 
is more than three times its probable error, and this one is not 
based on the extreme size-difference. In general, the difference 





























TABLE 2 
Unreliability of the differences in the aided recall of different absolute sizes 
SERIES | EXPOSURE | RESPecTivVe avenaces| PEON | PE. gist, PE 
Trade names 
1 and 2 2 21.5 28.3 6.8 3.07 2.21 
1 and 2 5 32.5 36.5 4.0 3.89 1.03 
1 and 2 15 46.1 45.2 -—0.9 4.30 —0.21 
1 and 3 2 21.5 22.1 0.6 2.34 0.26 
1 and 3 5 32.5 32.8 0.3 3.64 0.08 
1 and 3 15 46.1 43.6 —2.5 4.32 —0.58 
2 and 3 2 28.3 22.1 —6.2 2.88 —2.15 
2 and 3 5 36.5 32.8 —3.7 3.13 —1.18 
2 and 3 15 45.2 43.6 —1.6 4.20 —0 
Pictures 

l andl 20.6 24.3 3.7 2.39 1.55 
l and 2 5 33.9 30.6 —3.3 3.06 —1.08 
1 and 2 15 50.1 56.2 6.1 3.76 1.62 
l and 3 2 20.6 27 .2 6.6 2.54 2.60 
1 and 3 5 33.9 35.5 1.6 3.38 0.47 
1 and 3 15 50.1 43.2 —6.9 2.96 —2.33 
2 and 3 2 24.3 27.2 2.9 2.94 0.99 
2 and 3 5 30.6 35.5 4.9 3.65 1.34 
2 and 3 15 56.2 43.2 —13.0 3.69 —3.52 























corresponding to Series 1 and 3 is at least as insignificant as that 
corresponding to the lesser size-difference between Series 1 and 
2 or 2and 3. Where the difference is in the same direction as 
the increase of size the relation is regarded positive and where in 
the opposite direction a negative sign is appended. The dis- 
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position of the signs does provide slight support for the specu- 
lation that different exposure times might yield opposed results. 
There is a suggestion of a direct relation between size and recall 
under the shorter exposures with an inverse relation under the 
longer exposures. But the limited degrees of probability, the 
fact that the greater size-difference gives as insignificant results 
as the lesser, and the more general tendency of the positive 
and negative values to balance indicate the absence of any 
ponderable dependency between size and recall. This state- 
ment applies to both the picture and the trade-name data as 
secured under the present conditions. If least square lines were 
drawn through plotted points representing, respectively, the six 
sets of averages, their slopes would be susceptible of the same 
interpretation. 

When selecting the master set of advertisements there had 
been intent to secure a considerable range of familiarity with 
emphasis, however, upon the unfamiliar. Quite conceivably a 
relation between size and recall might be demonstrable in 
relatively unfamiliar material but not in familiar material 
or vice versa. Material very familiar as a whole might lead to 
such high scores in all sizes as to mask any size effect. Assum- 
ing size to give a real but slight advantage it might also be 
ineffective were all the materials quite unfamiliar and corre- 
spondingly resistant. Possibly, then, material of a certain degree 
of familiarity might show positive results which were hidden 
by the levelling effect of the more random range actually em- 
ployed. 

Speculating in this way, the scores of every subject on every 
trade-name were fractionated into five short ranges of familiar- 
ity based on his order of merit ratings. Scores of the several 
subjects in each group were then recombined upon this 
admittedly crude basis of familiarity. The unpublished 
figures seem to show no relation between scores and size regard- 
less of degree of familiarity. 


Our chief result though negative is definite enough to justify 
a certain verification of the published data on “relative’’ size. 
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Verification is desirable because of the failure to separate the 
relative and absolute factors. For instance, both relative and 
absolute factors are present whenever more than one advertise- 
ment is in the field, and that was true of most of the earlier 
studies. Since absolute size is found to have no significant 
effect when isolated the effect commonly attributed to relative 
size is presumably valid. Our analysis indicated that mere 
amount or extent of retinal stimulation is the primary effect of 
absolute size and that contrasting amount of stimulation is that 
of relative size. Contrasting size appears to be an important 
condition of memory while mere magnitude does not. 


RELATED PROBLEMS 


There is confusion in the literature as to what dre properly to 
be regarded tests of attention and what tests of retention. A 
priori reasons have been advanced, however, for considering cer- 
tain tachistoscopic, fixation, and recognition methods as more 
nearly measuring attention while pure and part recall methods 
measure memory more.’ Indeed, there is experimental evidence 
that the results of the recall methods correlate relatively highly 
with each other but low with the recognition and fixation methods 
which, in turn, tend to correlate more highly with each other.*® 

Experimentation has shown, then, that the apparent effective- 
ness of size, color, or other factors actually varies according to the 
type of test employed. Therefore our experiment might be 
repeated using a recognition or tachistoscopic method to dis- 
cover whether absolute size has attention value even though it 
does not appear to have memory value. Still there is reason for 
believing that such an experiment. would also give negative 
results. Poffenberger concluded from the considerable avail- 
able data on relative size that the more nearly the measure is 
one of attention only the more closely does the square root law 
tend to be approximated ; but that the more memory and various 


* Op. cit., 182, 197-8. 
® Ibid., 197-8, 514-519, with references to E. M. Achilles, E. R. Brandt, 
and H. K. Nixon. 
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other factors enter in to the measure the more does the value of 
large space increase.’ On the analogy of the relative size data, 
then, a memory method should give a more pronounced result 
than an attention method. Our result with the method of 
aided recall, however, was negative. Hence there is some reason 
to anticipate a negative result with an attention method. 

A desirable variation of our original procedure would be to 
employ a greater range of exposure times. This would tend to 
clear up the suspicion of an opposite size-effect under the 
shorter as compared with the longer exposures. 

An interesting extension of the original problem would be 
to use a much greater range of sizes to discover whether or not 
a differential would appear in extremes not reached in the experi- 
ment reported above. Simpler materials could be employed to 
avoid those difficulties of legibility and varied reading distance 
to be expected with extreme variation in size. Or, if regular 
advertisements were retained and no restriction on legibility or 
reading distance were exercised, other than to have the materials 
representative, the results would apply more directly to actual 
practice. The range of variation in reading distance, however, 
would then increase with resulting decrease in the range of 
absolute size of retinal images. Therefore, if the first procedure 
resulted in any differentiation the second would probably 
result in less. 


SUMMARY 


A selection of full-page advertisements was made with the aim 
of securing a set which would be as random and typical as 
feasible for the present purpose. 

Three series of these advertisements were prepared, each 
series identical except for size, and each bound in proportional 
covers to form a dummy magazine. 

The areas of the series vary as 1 to 2 to 3 and they include, 
or approximately include, the range of sizes of magazine pages. 

A retention value for each of the three series was determined 


7 Ibid., 199. 
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by applying the method of paired associates to each of three 
groups of 45 subjects. 

Three durations of exposure, i.e., two, five, and fifteen seconds, 
were employed in presenting each advertisement in each series. 

An effort was made, in addition, to expose any considerable 
dependence of the effect of size on the familiarity or unfa- 
miliarity of the material. 

Within our experimental limitations, the results seem to indi- 
cate that the retention value of magazine advertisements is 
independent of their absolute magnitude. As far as trade- 
name recall is concerned in full-page advertising, at least, it would 
seem that periodicals of all sizes can be used with about equal 
effectiveness. 

Certain possibilities and allied procedures are suggested which 
might lead to qualification of the chief result. 








CORRELATION BETWEEN INTELLIGENCE QUO- 
TIENT AND ACCOMPLISHMENT QUOTIENT 


HARL R. DOUGLASS anp C. L. HUFFAKER 


University of Oregon 


Ruch, Peters, Franzen, and others have one after another 
furnished data showing that judged by the correlation between 
IQ and AQ the brightest students are retarded and the dull 
students accelerated.! Such correlation coefficients have always 
been negative and apparently significant. Recently the ques- 
tion of the negative correlation between intelligence quotients 
and accomplishment quotients has been reopened.? In that 
paper it is pointed out that the fact of negative correlation is 
frequently misinterpreted and an attempt is made to show that 


that phenomenon is a statistical necessity, bound to be ob- 
tained when correlation between 1Q’s and AQ’s are calculated. 
In the article referred to an attempt is made to establish the 


1Ruch,G.M. “The Achievement Quotient Technique.’”’ Journai of 
Educational Psychology, 14: 334-43, September, 1923. 

Peters, C.C. ‘‘A Method for Computing Accomplishment Quotients 
on the High School and College Levels.’”’ Journal of Educational Re- 
search, 14: 99-111, September, 1926. 

Pintner, R., and Marshall, Helen. ‘“‘A Combined Mental-Educational 
Survey.” Journal of Educational Psychology, 12: 32-48, January, 
1921. 

Whipple, G. M. “Educational Determinism: A Discussion of 
Professor Bagley’s Address at Chicago.’’ School and Society, 13: 599- 
602, June 3, 1922. 

Franzen, Raymond. The Accomplishment Ratio: A Treatment of 
the Inherited Determinants of Disparity in School Product. New 
York, Teachers College, Columbia University, 1922, p. 14 (Teachers 
College, Columbia University, Contributions to Education, No. 125). 

? Wilson, W. R. “The Misleading Accomplishment Quotient.’ 
Journal of Educational Research, 17: 1-10, January, 1928. 
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phenomenon as due to errors of measurement. It is the purpose 
of this paper to show that the obtained negative correlation is 
not due to errors of measurement but to unique nature of the 
correlation coefficient between a variable and a ratio of which 
the first variable is the denominator. 

In the article cited the reader is asked to consider the situ- 
ation of a group of pupils all of whose AQ’s are exactly one, the 
correlation between IQ and EQ consequently being 1.00, and 
that between IQ and AQ being zero. It is pointed out that, 
if due to errors of measurement the correlation between IQ 
and EQ is less than 1.00, the regression line of EQ on IQ 
approaches the line drawn at the mean of the EQ’s and parallel 
to IQ axis. Hence for those values of IQ less than the mean 
IQ an EQ estimated by means of the regression equation 
would tend to be nearer the mean of the EQ’s than the actual 
EQ and therefore greater than the actual EQ As a conse- 
quence AQ’s obtained by dividing the estimated EQ by IQ 
will be larger than 1.00 for pupils below the mean in intelligence. 
Likewise for pupils above the mean in intelligence AQ’s 
obtained by dividing estimated EQ by IQ would be less than 
the AQ obtained by dividing actual EQ by IQ and conse- 
quently be less than 1.00. Thus, Wilson says, we obtain due to 
errors of measurement a negative correlation coefficient where 
no real correlation exists. That is, where the real correlation is 
zero, a negative coefficient is actually obtained due to errors of 
measurement. 

Wilson’s conclusion that the introduction of chance errors of 
measurement operates to produce a negative coefficient from 
data which but for chance errors of measurement would yield a 
zero coefficient does not conform to what we know to be the 
truth about the effect of chance errors upon the coefficient of 
correlation—namely, that the obtained coefficient tends to be 
nearer to zero because of chance errors of measurement. The 
larger the relative influence of chance errors the greater the 
regression toward zero. If, for example, the measures or scores 
are entirely the creatures of chance error, the coefficient of corre- 
lation must of necessity be zero. It should be remembered also 
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that accomplishment quotients are actually calculated by divid- 
ing the obtained EQ not the EQ estimated by means of the 
regression equation, by the IQ as is done in the proofs given by 
Wilson, and by Toops and Symonds.’ And further, it should be 
noted that if one employs the regression line of IQ on EQ 
instead of that of EQ on IQ (employed by Wilson) the 
exact opposite conclusion is arrived at, namely that AQ’s of 
slow pupils tend to decrease along with the decreasing IQ’s, and 
those of bright pupils to increase thereby operating to insure due 
to errors of measurement a positive correlation coefficient 
where really no correlation exists. 

The fact that IQ and AQ will, under the conditions which 
usually prevail in testing, correlate negatively can be readily 
shown but for reasons other than those involved in the geometry 
proof given by Wilson. Holzinger‘ gives the following formula 
for correlation between ratios: 


Tx: Vx i + Tyw Vy Ve — Txw Vz Ve on Pin. Con Vy V, 
eh, a (1) 


V (Va? — 2 rey Vz Vy + Vy*) (Vs? — 2 tes Va Vw + Vo?) 








=z 
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Ox 
M,, 
etc. When this formula is used to find the correlation between 
IQ and AQ let X = IQ and Z = EQ Then rx, is rq Eg 


in which r,, correlation between variables z and z, and V, = 


1 Ig 
or r,s. 
x 
When y of equation (1) is 1, Vy is zero, and all correlations 
involving such as rzy, etc; are zero, since a constant has no 
variability and does not correlate with any variable. V, is 
identical with V. and rzw is 1.00 since it is the correlation of z 
with itself. 


* Wilson, W. R., op. cit. Toops, H. A., and Symonds, P.M. ‘What 
Shall We Expect of the Accomplishment Quotient?’”’ Journal of Edu- 
cational Psychology, 13: 513-28, December, 1922. 

‘ Holzinger, K. J. ‘‘Formulas for the Correlation Between Raties.’’ 
Journal of Educational Psychology, 14: 344-47, September, 1923. 
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With these substitutions equation (1) becomes 


fxs V, — Vx 





(2) 


T = 
x5 VV. — 2 res Vz Vs + Vz? 





If we assume ox = o, and M, = M, and consequently V, = 
V., which would be the ideal situation of accomplishment quo- 
tient technique equation (2) becomes 


sa Tx: — 1 (3) 


rz 
7. V2 (1 — ras) 


Under these conditions rs (i.e., T7g Ee) will be 0 when ry, (i.e. 
x Iq 





TIg,Eq) = 1.00 and negative when rx, is less than 1.00. It is 
negative when V, is less than V, for any value ofr... Whenever 
rx: V, is greater than V,, r,, will be positive. This condition 
obtains only when r,, is positive and high and V, materially 
greater than V,, a very unlikely combination of circumstances. 
Much more reasonable is an approximate equality of V, and 
V, and a value for r,, not greater than 0.6 or 0.7. That a 
negative coefficient is practically a necessity is attributable to 


1 
the fact that z and are correlated negatively, perfectly nega- 


tively as measured by the correlation ratio and could be positive 
only when r,, is 1.00 or approximately 1.00. 

As a rule the tendency in actual practice is for M, = M, and 
o, to be less than o,, except when the reliablity of Z islow.5 In 
all such cases r,s must of necessity vary between 0 and — 1.00. 


It is because of this relationship that negative correlation 
between IQ and AQ is a necessary result except in the very. 
improbable instance where IQ and EQ are perfectly corre- 
lated and where each is measured by perfect instruments of 
measurement. In such an instance the expression in equation 
(3) becomes zero. To be sure, this does not preclude the possi- 
bility of an accidental small positive correlation between IQ and 


‘Ruch, G. M.: “The Achievement Quotient Technique.’”’ Journal 
of Educational Psychology, 14: 334-43, September, 1923. 
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AQ in instances where with finite data chance errors are not 
distributed in normal fashion but actually are correlated with 
true intelligence or true achievement. 

It is because of the necessity of the relationship as given in 
equation (3) that Wilson found that the introduction of chance 
errors of measurement into a situation where for each individual 
IQ = EQ and ryeaq = O resulted in changing that corre- 
lation from zero to a negative coefficient. 
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THE COLLEGE PSYCHOLOGY TEST! 
J. E. BATHURST 


Bureau of Public Personnel Administration, Washington, 
District of Columbia 


AND 


NORMA V. SCHEIDEMANN 
Chicago, Illinois 


INTRODUCTION 


The fruition of the educational testing program during the last 
quarter of a century is now affecting college work. For some time there 
was a general reticence among college instructors in encouraging the 
development of mechanical testing devices and a disinclination toward 
the standardization of college course content. The impossibility of 
standardizing the work in courses generally covered by the individual 
college professor as well as the undesirability of eradicating the in- 
dividuality of the various colleges were the objections commonly pre- 
sented to discourage the development of standardized college tests. 

However, the accrediting and the approving of colleges and uni- 
versities, the interacceptance of college credits combined with the more 
specific movement to section college psychology students on the basis 
of ability, a movement in which Lean Carl E. Seashore of the University 
of Iowa stands out so prominently, pointed to the desirability of an 
instrument which would measure in standardized units the product of 
the classroom in college psychology. In the hope of meeting to some 
degree this need, the writers developed a test covering the first year 
psychology in the liberal arts college. In devising such a test it was 
not desired to test the contents of a particular text but rather to measure 
knowledge of the scientific principles included in the field as a whole. 


CONSTRUCTING AND TRYING OUT THE TESTS 


The test is divided into six tests, each test consisting of a division of 
psychology. The divisions chosen may appear arbitrary but it is be- 





The writers wish to thank the teachers and college psychology 
students who so liberally coéperated in developing the tests. 
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lieved they are logical, comprehensive, and will assist materially in 
diagnosis. In the main, controversial material has been discarded. A 


TABLE 1 
Showing the distribution of total scores of 347 college psychology students 














Highest possible score 

Median score 

Coefficient of Correlation (Pearson r) 
Reliability (Brown) 





Showing the Mean (M), the number of cases (N), the Standard Deviation 
(S.D.), the Coefficient of Correlation (r odds vs. evens), the Probable 
Error (P.E.), and the Reliability Coefficient (Brown) of the Scores 
for Each Test and for the Total Scores 





r 
(opps 
vs. 
EVENS) 


TEST NAME M 





Psychological and Neurological 

Psychology 20.8 | 100 | 5. 0.032 
Abnormal Psychology 11.7 | 100 | 6. ; 0.056 
Animal and Child Psychology.| 9.5 | 100 | 2. 0.061 
Social and Applied Psychology.| 13.4 | 100 | 6. 0.060 
Systematic Psychology 7.3 | 100 0.066 
Experimental Psychology 6.2 | 100 | 6. , 0.055 
Test as a whole 76.4 | 100 /17. 0.034 























few facts quite prominent in psychological discussions were included 
although not wholly non-controversial. It was intended for the items 
in each test to be a sampling of a division of psychology. 








Saks mel 


Sela gep 


THE COLLEGE PSYCHOLOGY TEST 83 


After the material was selected, thrown into either true-false or 
multiple-choice form of items, organized into tests, and the tests 
mimeographed they were given to nearly four hundred first year psy- 
chology students in various colleges and universities. Each college or 
university followed the standard directions for administering the tests. 
The tests were returned to the authors for scoring. 


EXPERIMENTAL RESULTS 


Tables 1 and 2 show the results thus obtained. 

It is observed from table 1 that the frequency distribution of total 
scores is quite satisfactory, that the median score is a little low, since 
it is less than one-half the highest possible score, although not seriously 
so, that the measuring units of the tests are adequate in that the lowest 
score is more than one standard deviation above zero, that the highest 
score is more than one standard deviation below the highest possible 
score, and that the range of scores is about two-thirds of the possible 
range. It is observed from table 2 that the reliability of each test is 
quite satisfactory, the only possible exception being the test on Sys- 
tematic Psychology which is a little low, but when the number of items 
is considered the low reliability is not serious. The reliability of the 
test as a whole is also satisfactory, being 0.80. 

It is assumed that the factual information called for in the test is a 
satisfactory sampling of the facts taught in a first year college psy- 
chology class in the liberal arts college and hence that the test’s re- 
liability is therefore its validity. 


EXCERPTS FROM THE TESTS 


To show the nature of the material included in the tests a few items 
from each test selected at random are given below. 


Test 1. Psychological and Neurological Psychology 


1. The unit of the nervous system is the: (1) nervous arc 
(2) synapse (3) neurone (4) neucleus 

2. The receiving end of the neurone is called the: (1) 
axon (2) dendrite (3) end-brush (4) collateral.. 

3. Overactivity of one of the pituitary secretions causes: 
(1) an abnormal growth of the skeleton (2) cret- 
inism (3) goiter (4) anger 


Test 2. Abnormal Psychology 


1. The chronic symptom of senile dementia is loss of 
memory of recent events 

2. A condition characterized by an overpowering will is 
known as aboulia....... SOevscceesevasesceoesooeses 
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3. Feeblemindedness suggests or signifies delimitation 


rather than deterioration...............ccceeceeees >. 9 
4. Hysterical patients are extremely resistant to sug- 

GUEIIS. 6 FSGS SE NS 6 OCA BOZO hae Bi 56a 0 bs See Be ais 208 
5. The mental development of an imbecile corresponds 

to that of children two to eight years of age....... » ee 


Test 3. Animal and Child Psychology 


1. That animals learn primarily by the trial and error 
method is the contention of: (1) Wundt (2) 
Thorndike (3) Yerkes (4) Koffka................ _—— 
2. The one among the following which infants are in- 
stinctively afraid of is: (1) lightening (2) loud 
noises (3) animals (4) the dark.................. -a 
3. Most animal behavior may be explained on the basis 
of: (1) reason (2) instinct (3) intuition (4) imita- 


ee ee 


Test 4. Social and Applied Psychology 


1. The sense of individual responsibility is increased in 


2. Intelligence tests give predictive measurement of 


leadership qualitios.........6c.sccsccececccecccsccee 7 
3. The selection of men by personal interview is conceded 

to be the most reliable method.................... = 
4. The widest application of psychology is at present in 

CUMADIIIA. A 5s ada ieedsciedtansn ents dn csmenevesese T F 
5. Positive suggestions are usually more effective than 

Rv innscucewes dada dbbankdeaens dec Wawteoese x ow 


Test §. Systematic Psychology 


1. The theory which divides the mind into independent 
operations is called: (1) monistic (2) dualistic 
(3) functional (4) faculty.................eseeeee ——- 
2. The word among the following which is synonomous 
with affection, in the technical language of psy- 
chology, is: (1) feeling (2) emotion (3) sentiment 
cise est Aca taiesd ch donde banwens es 
. The genetic theory of color vision was suggested by: 
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Test 6. Experimental Psychology 


. There is usually a negative correlation between an in- 


dividual’s IQ and his scholastic success 


. The typical learning curve shows very slow progress at 


first with constant acceleration in achievement with 
practise 


. Small angles are usually overestimated 
. Localization of sound is best in the median plane... . 
. Weber’s law states that with the arithmetic increase 


of the stimulus the sensation increases geometri- 





A FURTHER NOTE ON THE DIFFERENTIAL IQ’s of 
SIBLINGS 


J. H. McFADDEN 
(University of Pittsburgh) 


PREVIOUS WORK 


In 1926 there appeared an article by Dr. Grace Arthur,' 
showing what appeared to be a tendency for younger children to 
have higher IQ’s than their older siblings. The present 
writer had been of the opinion that this tendency existed and 
in investigating this matter he came across Dr. Arthur’s report. 
Since then he has compiled a number of cases to see whether or 
not they confirmed Dr. Arthur’s position. 

Dr. Arthur’s figures may be briefly summarized as follows: 








Average IQ’s. 

FAMILIES SIBS OLDEST NEXT NEXT YOUNGEST 
19 4 77.2 80.9 87.4 93.5 
85 3 82.7 90.3 95.3 

271 2 89.3 96.9 




















She presents evidence to show that various extraneous factors 
have been ruled out—i.e., the chance that the lower year-levels 
may be unduly easy; the chance of coaching by older sibs; 
better command of the English language—so that the differ- 
ences are really differences of abilities of the subjects. In all 
of the tests the Kuhlmann 1917 Revision of the Binet-Simon 
scale was used. To account for the differences in IQ’s of 


‘Grace Arthur (of the Child Guidance Clinic, St. Paul, Minn.): 
“The Relation of IQ to Position in Family,” Jour. Ed. Psychol., 1926, 
17, pp. 541-550. 
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the siblings she suggests that it may be due in part to the 
changing culture in the homes of immigrants, as a large propor- 
tion of the subjects were children of immigrants. 

The subjects whose IQ’s appear in the writer’s tables are all, 
or practically all, of native stock, since the State of North 
Carolina is extremely homogeneous and with a marked absence 
of foreign population. Since this difference in IQ’s still 
appears, the writer is of the opinion that the explanation sug- 
gested by Dr. Arthur does not cover the case sufficiently. 


PRESENT RESULTS. 


In the writer’s tables, the IQ’s were all obtained with the 
Stanford Revision of the Binet-Simon scale with the exception 
of those found in table C, where the National Intelligence Test 
was used. 

In table A all of the cases were examined by the writer and 
include subjects from the following groups: pupils from the 
first, second, and third grades of an exceptionally good public 
school, the IQ’s of all of the pupils tested in the school averag- 
ing around 109; backward pupils in another school, the pupils 
being examined at the request of the teachers; children in an 
orphanage; and children in the North Carolina Training School 
for Mental Defectives. Although selective factors operate in 
the various groups, the writer tried to rule out as far as possible 
all cases except those where the factors operated equally on all 
the siblings in the same family. 

Table B includes cases from the files of the Bureau of Mental 
Health and Hygiene of the North Carolina State Board of 
Public Welfare. Dr. H. W. Crane, Director of the Bureau and 
a Professor of Psychology in the University of North Carolina, 
very kindly gave permission for the use of this material. These 
cases are mainly clinic and institution cases, which accounts 
for the low average IQ. Most of the cases were tested by the 
Director of the Bureau, the others being tested by other examin- 
ers under his direction. 

Table C contains IQ’s obtained by the use of group tests— 
the National Intelligence Tests—of children in a public school. 
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The writer is indebted to Dr. A. M. Jordan, of the University 
of North Carolina Faculty, for the use of these cases. 


TABLES OF IQ’S SUBMITTED BY THE WRITER 


The columns in the tables are as follows: I, number of siblings in the 
family under consideration; II, number of families; III, average IQ 
of the oldest sibling in the family; IV, average IQ of the next oldest; 
V, next oldest; etc., to the youngest. 

Tabie AA corresponds to table A, giving the standard deviations 
where the number of cases seemed to justify them; table BB corresponds 
to table B; table AABB corresponds to table AB. The data were not 














worked up for table C. 
TABLE A 
I Ir Ilr Iv Vv vi vir vilr 
5 1 55.0 74.0 80.0 82.0 82.0 
4 1 60.0 71.0 63.0 98.0 
3 10 74.0 81.7 89.8 
2 31 90.7 94.2 (S.D. of difference, 5.28) 



























































TABLE B 4 

I Il Ir Iv Vv vi vil virr ; 
¢ 

6 1 440 | 40.0 | 51.0 | 93.0 | 50.0 | 60.0 i 
5 1 39.0 | 36.0 | 47.0 | 64.0 | 72.0 3 
4 4 62.3 | 66.5 | 73.8 | 78.8 a 
3 28 67.1 | 72.7 | 77.9 
2 95 68.8 74.6 | (8.D. of difference, 2.43) 

TABLE AB 
I Ir pee Iv v vi vil VIII 
6 1 440 | 40.0 | 51.0 | 93.0 | 50.0 | 60.0 
5 2 47.0 | 55.0 | 63.5 | 73.0 | 77.0 
4 5 61.8 | 67.4 | 75.6 | 82.6 
3 38 68.9 | 75.1 | 81.0 
2 126 74.2 79.6 | (S.D. of difference, 1.75) 
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TABLE C 





It Iv 





81.0 75.0 
92.1 94.2 
92.8 92.5 











TABLE AA 





mt Iv v 





21.71 +3.26| 16.84 +2.53 | 12.35 41.85 
21.84 +1.83 | 19.72 +1.69 





TABLE BB 





Ilr Iv v 





12.94 +1.16| 10.06 +0.91 | 13.26 +1.19 
16.51 +0.81 | 17.01 +0.83 





TABLE AABB 





m1 Iv v 





16.02 41.27; 12.84 +1.01| 14.94 +1.18 
14.01 +0.59 | 13.11 +0.55 

















DISCUSSION 


a. The writer has no thesis to uphold in this presentation of 
cases, but simply wishes to confirm the results noted by Dr. 
Arthur. 

b. The use of individual tests undoubtedly shows a tendency 
for younger children to have higher I1Q’s than their older 
siblings. This may be due to faults in the scales, or may be 
a manifestation of an actual difference in ability. If we accept 
the theory of the constancy of the IQ, it would seem that there 
is an actual difference in ability. 

c. The standard deviations of the older siblings have a 
tendency to be higher than those of the younger. This was 
also found by Dr. Arthur. If it be true that the older are more 
variable, it is not strange that they furnish a larger porportion 
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of “geniuses” than do the younger, as some investigators seem 
to find. 

d. The standard deviations presented in table AA are ex- 
tremely high, due mainly to the fact that the groups represented 
there are widely separated in ability. The IQ’s vary con- 
siderably, but the IQ’s of the pairs of siblings vary together, 
as shown by a coefficient of correlation of 0.82 + 0.04 between 
the IQ’s of the younger and older siblings in the two-sibling 
families in table A. It may be noted here that the correlation 
between the younger and older siblings in the other two-family 
groups are: for table B, 0.74 + 0.05; for table AB, 0.90+ 
0.013—extremely high coefficients, but the writer believes them 
accurate. 

e. The results from the use of the Terman Revision agree 
among the different examiners, and the Kuhlmann and Ter- 
man Revisions agree with each other, but disagree with the 
results from the group tests. Why this is the writer does 
not know unless the scores on the group tests are more depend- 
dent on schooling than are the scores on the individual tests, 
which might cause a “leveling” of scores of school children. 
Jordan? has found that the average IQ’s obtained by the use 
of these (National Intelligence) tests were consistently higher 
when the tests were given in the spring than when they were 
given in the fall, which might indicate that pupils were “rusty” 
after the summer vacation. : 

f. It is also of interest to note that the IQ’s cf the older 
children are progressively higher as the size of the families 
diminish—which also holds true of the next-oldest, etc. Thus, 
in table AB, the average IQ’s of the oldest children in the four- 
sibling families is 61.8; in the three-sibling families, 68.9; in 
the two-sibling families, 74.2. The summary table compiled 
from Dr. Arthur’s report shows the same tendency. It may be 
that this is simply the result of increasingly large numbers of 
small-family cases included in the school-children group, and 


2A. M. Jordan: ‘‘Educational Psychology,’”’ Henry Holt and Co., 
New York, 1928, p. 323. 
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increasingly large numbers of large-family cases included in 
the clinic and institution groups. If this be true the normal 
difference between IQ’s of school children and IQ’s of clinic 
and institution cases would explain the apparent tendency. 
However, the cases in table B do not seem to bear out this 
explanation. 








NOTES AND NEWS 


The Ninth Educational Conference will be held at Ohio State Uni- 
versity, Columbus, Ohio, on April 4, 5 and 6, 1929. Those who plan 
these conferences seek to give each one a keynote. The announcement 
for this one is ‘‘Evaluating Education.’’ Among the most interesting 
subjects to students of applied psychology are the following; ‘‘Individ- 
ual Differences in Relation to Personality’ by Joseph Jastrow; ‘‘Does 
Education Affect the Child’s Level of Intelligence?’”’ F.N. Freeman; 
“Listening Ability—Its Importance, Development and Measurement,”’ 
Paul T. Rankin; “Achievement Testing,’’ A. Coleman; ‘‘Why Stop 
Learning?’ Chancellor E. H. Lindley; Formal and Informal Testing of 
Educational Attainments,’’ Arthur I. Gates. 


The National Conference of Social Work will hold its fifty-sixth meet- 
ing in San Francisco, California, June 26 to July 3, 1929, under the lead- 
ership of President Porter R. Lee, Director of the New York School of 
Social Work. The program deals with child welfare, community life, 


delinquency, health, immigration, mental hygiene and similar social 
problems. Those interested may write to Howard R. Knight, General 
Secretary, 277 East Long Street, Columbus, Ohio. 


The annual meetings of the National Vocational Guidance Associa- 
tion were held in Cleveland February 20 to 23. The groups meeting at 
this time included the National Vocational Guidance Association, 
National Association of Appointment Secretaries, National Committee 
of Bureaus of Occupation, National Association of Deans of Women, 
College Personnel Officers, Personnel Research Federation, and repre- 
sentatives from the American Management{Association, Deans of Men, 
American Association of Collegiate Registrars, American Council on 
Education and other interested societies. Those interested in securing 
reports of these meetings may communicate with Agnes B. Leahy, 
Personnel Research Federation, 29 W. 39th Srreet, New York City. 


Industrial Psychology Monthly, the magazine of manpower, which 
has been edited for industrial executives by Dr. Donald A. Laird since 
January, 1926, suspended publication in its present form with the issue 
of December, 1928. It is understood that later in 1929 the title and edi- 
torial policies of the periodical will reappear combined with a long 
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established business magazine which has a circulation of more than 
20,000 copies monthly. During the three years of publication of Indus- 
trial Psychology Monthly the editorial treatment was directed toward 
the upper strata of executives; the new combination is expected to direct 
its principal appeal to foremen and superintendents. The peak circula- 
tion of Industrial Psychology Monthly was 1,300 copies. 


Dr. Donald A. Laird who until recently was the editor of Industrial 
Psychology Monthly and director of the Colgate Psychological Labora- 
tory, has been appointed Chief of Scientific Staff of the Personnel Analy- 
sis Bureau of Chicago. Dr. Edward G. Stoy, recently of Carnegie 
Institute of Technology, is Deputy Chief. Other members of the scien- 
tific staff are Dr. Edward K. Strong of Stanford University, Dr. Forrest 
A. Kingsbury of the University of Chicago, and Dr. John L. Stenquist 
of the Baltimore public schools. 

The Bureau provides a personal psychological test service for busi- 
ness and professional men and women. No vocational guidance or 
employment testing work is undertaken, the Bureau referring requests 
for test service of this sort to individual psychologists. 


Wesleyan University, Middletown, Conn., has granted a leave of 
absence for the second semester of this year to Dr. Carney Landis to 
carry on research with Dr. Lashley at the Institute for Juvenile Research, 


Chicago. Dr. T. A. Langlie is Acting Chairman of the Wesleyan 
Department. 


The tenth annual report of the Commonwealth Fund, of New York 
City, announces appropriations totalling $2,083,621.80 during the last 
year in furthering wide range of public health, mental hygiene, child 
welfare, and educational activities. One of the principal projects of 
the Fund, the provision of fellowships for British graduate students in 
American universities, received a total of $198,150 for the year. There 
are now fifty-one fellows studying at seventeen colleges and universities, 
making eighty-five in all who have studied and traveled in this country 
since 1925. For the past six years a program for the development of 
child guidance clinics and visiting teacher service in the public schools 
has been carried on with important results, both in this country and in 
England. 








BOOK REVIEWS 


J. E. Wawtiace Wain. Clinical and Abnormal Psychology. Houghton 
Mifflin Co., The Riverside Press, Cambridge, Mass., 1927. xxii + 
649 pp. 

The book is divided into four parts. Part I deals briefly with the 
different viewpoints and methods obtaining in present day psychology; 
different types of variation in the individual expressed in various ‘‘ages’’ 
as chronological age, physiological age, etc.; and the aims, methods, and 
general principles of the psychoclinical examination. Part II, on 
intelligence, besides a rather full discussion of individual and group 
intelligence tests, contains chapters on sensory defects, keenness of 
sensitivity, attention, imagery, association and other topics. It closes 
with a chapter on thought, language, speech defects and reasoning. 
Part III is entitled ‘‘Motility” and Part IV “Emotivity.’”’ Mental 
abnormalities and defects are discussed insofar as they are germane to 
the purpose of the book. There are a number of illustrations and 
tables. 

While much of the content of the book is not new, the author has 
performed a real service in collecting and organizing for the student a 
considerable amount of valuable material that has hitherto been scat- 
tered and hence not readily available. From his own extensive clinical 
experiences and researches the author gives a number of suggestions 
and presents some original data. The book is a careful and methodical 
piece of work, based to a large extent upon investigations to be found in 
the literature. The historical perspective presented in the discussion 
of a number of the topics will appeal especially to the careful scholar, 
although the nature of the book precludes their being more extensive 
than they are. One feels that it is a solid work, as sound as it could 
well be in view of the present stage of development of clinical psychology. 

Amos C. ANDERSON, 
Ohio University. 


Fuorence L. Gooprenover. The Kuhlmann-Binet Tests, A Critical 
Study and Evaluation. University of Minnesota. The Institute 
of Child Welfare. Monograph Series No. II, 1928. 146 pp. 

As the title suggests Dr. Goodenough has made a critical study and 
evaluation of the Kuhlmann-Binet Tests—particularly with respect to 
their reliability. A total of 495 children, ages two to four respectively, 
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were given at least one examination and 393 of these were given two ex- 
aminations. A study of these retests was made to determine the relia- 
bility of the scale for the age levels mentioned and also to throw light 
upon the value of the particular tests within the scale. Interesting 
facts are brought to light concerning the variation of intelligence with 
occupational level; the relative gains on retests of the various IQ levels; 
the placement of the individual tests in the scale; and the effect of such 
misplacement upon changes in IQ. The comments upon specific tests 
are especially valuable in the subjective analysis of any particular test 
results with a given child. 

The experimental procedure is careful and as accurate as is possible 
in @ field where many variables abound. Dr. Goodenough has chosen 
her subjects from definitely known sources rather than trusting to chance 
selection. This seems to the reviewer highly commendable. The 
reader will find this Monograph an excellent piece of work ina field which 
testing has invaded with only partial success. Dr. Goodenough’s 
suggestions and study may prove of considerable aid in building intelli- 
gence tests for the younger age levels which are more reliable than those 
which have been constructed up to the present. 

S. M. Sroxe, 
Ohio University. 


CuarK L. Hutt. Aptitude Testing. World Book Company, Yonkers- 
on-the-Hudson, New York, 1928. 535 pp. 

The book is divided into two parts, but the reviewer is impressed with 
three distinct aspects: the historical phases of aptitude testing (in- 
cluding the “‘re-explosion’’ of pseudo methods of aptitude testing such 
as phrenology, physiognomy, etc.); the theoretical aspects of aptitude 
testing; and third the practical application of these theoretical aspects 
to the problem of actually discovering and measuring aptitudes. 

One regrets the necessity for the inclusion of the discussion of ana- 
tomical and other alleged signs of aptitude. However when one con- 
siders the amount of belief in those signs which is present among our 
so-called ‘‘educated people,’’ one is forced to admit the necessity for the 
inclusion of such material. The author has reviewed the cases for and 
against a number of such hoary frauds as phrenology and disposes of 
them with scientific precision. 

The discussion of the theoretical aspects of aptitudes seems to the 
reviewer to be a very significant contribution. The author adopts a 
“strict group-factor theory”’ as to the nature of aptitudes. His position 
lies between that of Spearman’s “G factor” plus specific factors as the 
composition of all aptitudes and the alleged Thorndikian view that 
aptitudes are composed of specific abilities. The author maintains his 
position by demonstrating mathematically the possibility of his theory. 
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Finally the author applies his theory to the problem of actually 
measuring aptitudes. Common errors of collécting and treating data 
are examined and better methods shown. Systematic handling of 
various types of tests and data is explained in detail and with accuracy. 
The author makes abundant use of statistics but explains his work so 
well that the reader is in danger of beguiling himself with the belief that 
he himself is becoming quite a statistician. However the author has 
avoided the error so often made by the statisticians and does not regard 
mathematical treatment of data as sufficient redemption from all the 
sins that may be made in collection. Formulas are not regarded as con- 
jurers of facts from empty hats but as useful tools when applied in the 
right manner upon the type of material to which each is adapted. 

No doubt any reader can find fault with any book. The reviewer 
placed a few question marks in the margins as he went through, but they 
indicate only minor quibbles and are of so little significance as to be out 
of place in a review of so excellent a book. It is seldom that humor 
enters into a scientific work of this sort, but the author has unconsciously 
introduced a chuckle (for all who are not behaviorists) by attempting 
to simplify the definition of ‘“‘subjective scales.’’ Let me bait the 
reader’s interest by giving him the page number—393. However let no 
one think that the book is in any way controversial—it is scientific, 
readable, and an able contribution to a field where good texts are few. 

Srvuart M. Sroxe, 
Ohio University. 


Cart Murcuison. Criminal Intelligence. Worcester, Mass. Clark 
University. 

Wituram T. Root. A Psychological and Educational Survey of 1916 
Prisoners in the Western Penitentiary of Pennsylvania. Board of 
Trustees of the Western Penitentiary. 246 pp. 

These two recent studies of prison populations raise anew the issue 
of the validity and meaning of tests of adult intelligence and make em- 
phatic the need of further analysis and standardization. Murchison 
using the Army Alpha concludes that convicted criminals as a body are 
equal or superior to the general population. Root employing the Stan- 
ford-Binet individual examination finds his prison group to rank very 
low in intelligence. 

Murchison classifies the long list of legal crimes into seven groups, 
the point of view of the classification being legal and logical rather than 
psychological; certain of the groups would appear to be very heteroge- 
neous, as for instance the one’called ‘‘crimes of physical injury,”’ within 
which both murder and arson are included. Intelligence scores varied 
decidedly with the type of crime, those convicted of crimes of decep- 
tion and fraud standing highest and those convicted of sex crimes lowest. 
The distribution of the various crimes among the prisoners showed 
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racial-national variations. Thus among foreign Whites crimes of 
physical injury are 13.3 per cent more common than among native 
Whites, the difference being due largely to the frequency of this type of 
crime with the Italians; the latter however are characterized by virtual 
absence of crimes of social dereliction (desertion of family, etc.). Mur- 
chison makes the interesting suggestion that the ‘‘Southern Negroes in 
types of crime committed compare to the Northern and Western Euro- 
peans.’’ If we arrange in order of frequency the types of crime com- 
mitted by each local-national group, it appears that the Northern and 
Western Negro is closely similar to the Northern native White, the 
English- and German-speaking foreign White, and the native White 
total; and that he resembles these groups distinctly more than he does 
the Negro total, the Southern and Borderland Negro, and the Southern 
and Western White. It would be very interesting and valuable to com- 
pare the incidence of the Negro crime in the total Negro populations of 
these areas, as Root has done in Pennsylvania to determine whether the 
degree of similarity to the Whites would show local variations in this 
respect also. The Southern Negro and the Italian are closely similar in 
the high frequency of crimes of physical violence and the very low fre- 
quency of crimes of social dereliction. 

The prisoners as a whole are very young, especially the thievery and 
statutory-offense group. ‘‘As far as maturity is concerned this group 
is almost like a Freshman class in college. Mere youths, yet carved 
with the stigma of everlasting shame!’’! Recidivists tend to be superior 
in intelligence to first offenders, although a number of sub-groups exist 
in which the reverse is the case; this relationship varies somewhat with 
the type of crime, first offenders scoring higher in fraud and sex crimes. 
The Alpha scores vary also with the literacy of the prisoners; the latter 
moreover varies with type of crime and with race. Native White and 
Negro prisoners are much lower in literacy then corresponding army 
groups. 

The 244 pages of presentation of data are in the main composed of 
articles elsewhere published and here assembled with little revision. 
Of the balance of the book, twelve pages are devoted to a statement of 
‘pre-war contemporary opinion,’’ five pages to the ‘‘idea that criminals 
are feeble-minded”’ (which the writer regards as very prevalent), six 
pages to a discussion of the extent to which the army norms are repre- 
sentative, and a final three pages to a discussion headed ‘The Prevailing 
Fallacy of Materialism,’’ containing what appear to be the author’s 
personal convictions in the matter of social control. This last section 
is singularly divorced from the findings of the study as well as from the 
voluminous literature dealing with the complexity of the causes of crime 
in which low intelligence has sometimes been denied as a marked differ- 
entiating characteristic of inmates of correctional institutions. 





1 The quotation is indicative of the temper of this section. 
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Root’s paper is a compact presentation of the results of an intensive 
study of 1916 prisoners made with the aid of the prison psychiatrist and 
of graduate students so that from eight to twelve hours of individual 
case work were done on each man. 

The median IQ for all races and all types of crime is 76.2. Root’s 
findings are similar to Murchison’s in point of the existence of wide 
variations in intelligence with type of crime, race and age; they also 
reveal a preponderance among his prisoners of very youthful men. Of 
the leading racial-national groups the Whites stand highest with a 
median IQ of 77.4; the Italian, Polish-Russian, and Austrian follow 
with median IQ’s of approximately 71, and the Negro is lowest 
with a median of 68.3. Among the more intelligent prisoners of all races 
crimes against property are relatively frequent, the group of White em- 
bezzlers with a median IQ of 103.75 standing 15 points above any other 
group. Sex crimes and arson tend to be committed by the most inferior. 
Exceptions to this however occur in the Italian ‘‘felonious assault’’ 
division which leads the Italian group, and in the rape groups which 
stand relatively high in the Austrian and Russian-Polish groups and 
to a lesser extent in the Negro group. The native White group shows 
a relatively wide distribution of intelligence with marked contrast 
between different crime groups; the other geographical-nationality- 
race groups show in overwhelming majority borderline or lower in- 
telligence quotients and the contrasts between different crime groups 
are much less. A significant factor in these findings is the almost 
entire absence of the highest group, viz., embezzlers, among any save 
the native Whites. With the conspicuous exception of the embezzle- 
ment group the older men tend to have lower IQ’s. Robbery is the type 
of crime committed by intelligent younger men, 77 per cent being of 
normal or superior intelligence. The average native White rape case 
is older. 

As regards recidivism Root believes that information is inaccuate 
because of the inadequacy of recording systems together with the tend- 
ency of the men to conceal if possible past offenses. ‘“The predatory 
groups surpass all groups in the percentage of recidivism, an average 
of 63 per cent being recidivistic.’’ Although Murchison is more inter- 
ested in relating recidivism to intelligence the data which he presents 
yield in the main similar indications. 

A very interesting part of Root’s study is his consideration of the 
percentage among the native White, Negro and Italian prison groups of 
each type of crime and his investigation of the incidence of crime in each 
of these groups, which latter is essentially a comparison of the per cent 
of each group in the prison with the per cent in the population of the 
counties from which the penitentiary draws its prisoners. Within the 
prison group the native White has committed the greatest range of 
crimes. Of his crimes 67.1 per cent are predatory (crimes against 
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property), 16.3 violence and 14.2 sex. The corresponding figures for the 
Negro group are 53.7, 40 and 5.9; and for the Italian, 18.3, 69.3 and 10.2. 
But when relationship to the populace at large is taken into considera- 
tion and the incidence of the other groups is compared with that of the 
native Whites it appears that 13.69 times as many Negroes and 3.45 
times as many Italians are imprisoned as would be the expectation if 
the numbers in the penitentiary bore the same relation to numbers in 
the population as obtains with the native Whites. Taking the per cent 
of the native Whites imprisoned for a given type of crime as unity the 
proportionate frequencies of the various types of crime among the Ne- 
groes are: predatory, 11.03; violence, 34.22; sex, 5.48. The correspond- 
ing figures for the Italian group are 0.98, 14.88 and 2.30. The author 
warns the reader against attributing these findings to racial traits of a 
biological or hereditary nature. ‘‘The Negrocriminal .... . is the 
victim of a vicious circle of social, biological and economic causes; lack 
of education, no trade training commensurate with the intelligence he 
has; a set of moral, social and leisure habits adjusted to a rural Southern 
community, a victim of caste, forced to live in discarded houses of the 
dominant race, restricted in employment and social opportunity, the 
Negro is daily forced to feel inferiority and humiliation in a thousand 
ways. . . . . Debarred from this and that by a thousand social taboos, 
the lot of the Negro is unparalleled in the experience of any race.’? And 
in the matter of violence as an alleged racial characteristic of the Italians 
the significant fact is brought out that the group of native White offend- 
ers whose parents were born in Italy shows a distribution of types of 
crime which is practially identical with that obtaining in the native 
White group at large. 

Of great practical importance is the part of the investigation which 
deals with the status of the prisoners as regards education and training. 
Basing his estimates on public school norms of progress to be expected 
in each intelligence group Root concludes that from 51 to 100 per cent 
of the various groups have received quite inadequate schooling. The 
average prisoner is capable of three years more schooling. The native 
Whites come nearest to having had an education proportionate to their 
intelligence indication, the Negroes next and all foreign groups least. 
Similarly with trade training; from 35 to 76 per cent of the borderline 
or better groups have received inadequate trade training, the per cent 
being least for native Whites and greatest for the Italians. Occupa- 
tional training is the ultimate need of the prisoner; tasks assigned him 
are of value in proportion as they increase his vocational adaptability. 

Root finds from individual interviews the most conspicuous feature 
of the prisoners’ religion to be what he calls “‘religiosity’’ or the per- 
sistence of the forms and dogmas learned in early childhood and quite 
insulated from daily behavior. This the author believes is often due to 
inability to use constructive imagination in applying the ritualism 
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to conduct. Marked cases of religious intensity are likely to occur in 
sex offenders—rape, incestuous rape and sodomy being the principal 
charges. 

The concomitance of many indirect social factors receives attention. 
In the various crime groups percentages of from 73 to 81 of the Negroes 
come from outside of the state, most of them from the rural South. 

Root is at present engaged in re-testing prisoners who were examined 
upon their entrance to the prison school two years ago. It seems a 
reasonable conjecture that the results of such re-tests after a period of 
schooling will yield a more reliable picture of the intelligence of the 
prisoners. So far twelve cases only have been re-tested. Of these, 
nine made gains of from 5 to 27 points in the IQ, one case showing an 
advance of from 45 to 72, and another of from 64 to 81. On the whole a 
majority of the cases bore out in their educational progress the prognosis 
made on the basis of the tests. These findings illustrate both the pos- 
sible dangers of the tests when too rigidly interpreted and the usefulness 
of the norms which the psychologist can furnish the prison educator. 

The conspicuous difference existing between the conclusions of the 
two authors as regards the intelligence of prisoners as compared with 
that of the population at large springs from the methods of testing 
which they employed and the interpretation placed upon test findings. 
Root applies the well-known interpretation of the Stanford-Binet scores 
presented in Terman’s ‘‘Measurements of Intelligence’ according to 
which a mental age of 16 years is that of the average normal adult. 
If he had used as a basis of comparison the distribution of the scores of the 
army draft—especially those of the 653 native-born Whites of group X 
who were given the individual Stanford-Binet—it is obvious that the 
difference between his findings and those of Murchison as regards the 
intelligence of penitentiary inmates would have been much less. The 
median mental age of this group was 13 years and 3 months while the 
median mental age of Root’s group of native White prisoners was 12 
years and 5 months, or 10 months lower. Inasmuch as Root presents 
his intelligence data throughout in terms of the Terman intelligence 
divisions (imbecile, moron, borderline, etc.) it is impossible to rear- 
range them in terms of the calculated equivalent army mental ages or 
letter grades for purposes of comparison with Murchison’s findings. 
As it happens, however, the army A group (mental ages 18 and above) 
corresponds closely to the combined superior and very superior group 
of Terman (mental ages 17 years and 6 months, and above); the army C, 
C plus and B groups (mental ages 13 to 17.9 years) correspond to the dull 
and average normal (12 years 8 months to 17 years 5 months); the army 
C minus group (mental ages 12.9 years) matches the borderline (11 
years 2 months to 12 years 7 months); and the army D and D minus 
groups (10.9 years mental age and below) match the moron and imbe- 
cile groups (mental age 11 years and 1 month and below). The re- 
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viewer presents the accompanying table for whatever suggestive value 
it may contain, realizing the hazard which springs from the fact that the 
two sets of groupings are not quite co-extensive: 
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The above table presents a rough comparison of the data of Root, Murchison, and certain 
army draft groups (Memoirs of the National Academy of Sciences, Vol. 15). The numbers 
represent percentages of the group named at the head of the column in the intelligence divi- 
sions specified in the column at the extreme left. 


The above comparison indicates that the differences between Root and 
Murchison as regards the intelligence level of prisoners are much re- 
duced when an effort is made to supply a ‘‘common denominator.”’ 
The indication remains, however, that Root’s total White group is 
somewhat lower than the army group and substantially lower than the 
Pennsylvania total White draft. This indication is supported by the 
possibility that the draft group is slightly lower in intelligence than the 
population at large, and by the fact that studies of detained juvenile 
offenders for whom more adequate tests and bases of comparison are 
available have always shown as a group some measure of inferiority. 
From the standpoint of practical measures, however, the level of 
intelligence of the penitentiary group as a whole matters very little; 
abundant evidence is at hand that the weighting and utilization of the 
factor of intelligence is to be made only in connection with the total 
personality of the individual man. If intelligence is underestimated 
a conservatism of prognosis is thereby favored which will tend to en- 
hance the value of psychological study in the eyes of the practical ad- 
ministrator. Even though the individual Binet scale may penalize 
older and extra-academic adults it nevertheless remains the best single 
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instrument we have at present for prediction of the amount of education 
which an individual can ‘‘take.’’ In the matter of revealing numerous 
social and personal elements which enter into the production of crime 
Root has made an able and convincing study; and in the outlinining of 
social and educational procedure for the reduction of law breaking 
together with an exceptionally clear statement of a deterministic phi- 
losophy of crime he has produced an exceedingly important contribution 
to penology. 
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